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Preface to the Second Edition 


The changes from the first edition to the second have two sources: the many helpful 
suggestions the author has received from colleagues, reviewers, students and others 
who took the time and effort to contact me, and the author’s experience teaching with 
this text in the years since the first edition was published. 

Though the bulk of the text has remained unchanged from the first edition, there 
are a number of changes, large and small, that will hopefully improve the text. As 
always, any remaining problems are solely the fault of the author. 


Changes from the First Edition to the Second Edition 


(1) A new section about the foundations of set theory has been added at the end 
of Chapter 3, about sets. This section includes a very informal discussion of 
the Zermelo—Fraenkel Axioms for set theory. We do not make use of these 
axioms subsequently in the text, but it is valuable for any mathematician to 
be aware that an axiomatic basis for set theory exists. Also included in this 
new section is a slightly expanded discussion of the Axiom of Choice, and 
new discussion of Zorn’s Lemma. 


(2) Chapter 6, about the cardinality of sets, has been rearranged and expanded. 
There is a new section at the start of the chapter that summarizes various 
properties of the set of natural numbers; these properties play important roles 
subsequently in the chapter. The sections on induction and recursion have 
been slightly expanded, and have been relocated to an earlier place in the 
chapter (following the new section), both because they are more concrete than 
the material found in the other sections of the chapter, and because ideas from 
the sections on induction and recursion are used in the other sections. Next 
comes the section on the cardinality of sets (which was originally the first 
section of the chapter); this section gained proofs of the Schroeder—Bernstein 
theorem and the Trichotomy Law for Sets, and lost most of the material about 
finite and countable sets, which has now been moved to a new section devoted 
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to those two types of sets. The chapter concludes with the section on the 
cardinality of the number systems. 


(3) The chapter on the construction of the natural numbers, integers and ratio- 


nal numbers from the Peano Postulates was removed entirely. That material 
was originally included to provide the needed background about the number 
systems, particularly for the discussion of the cardinality of sets in Chapter 6, 
but it was always somewhat out of place given the level and scope of this text. 
The background material needed for Chapter 6 has now been summarized ina 
new section at the start of that chapter, making the chapter both self-contained 
and more accessible than it previously was. The construction of the number 
systems from the Peano Postulates more properly belongs to a course in real 
analysis or in the foundations of mathematics; the curious reader may find 
this material in a variety of sources, for example [Blol1, Chapter 1]. 


(4) Section 3.4 on families of sets has been thoroughly revised, with the focus 


being on families of sets in general, not necessarily thought of as indexed. 


(5) A new section about the convergence of sequences has been added to Chap- 


ter 7. This new section, which treats a topic from real analysis, adds some 
diversity to Chapter 7, which had hitherto contained selected topics of only 
an algebraic or combinatorial nature. 


(6) A new section called “You Are the Professor” has been added to Chapter 8. 


This new section, which includes a number of attempted proofs taken from 
actual homework exercises submitted by students, offers the reader the op- 
portunity to solidify her facility for writing proofs by critiquing these sub- 
missions as if she were the instructor for the course. 


(7) The notation for images and inverse images of sets under a function, defined 


in Section 4.2, has been changed from the non-standard notation f,(P) and 
f*(Q) used in the first edition to the standard notation f(P) and f~'(Q), 
respectively. Whereas the author still finds the notation used in the first edi- 
tion superior in terms of avoiding confusion with inverse functions, he has 
deferred to requests from colleagues and reviewers to switch to the standard 
notation, with the hope that any confusion due to the standard notation will 
be outweighed by the benefit for students in preparing to read mathematical 
texts that use the standard notation. 


(8) All known errors have been corrected. 


(9) Many minor adjustments of wording have been made throughout the text, 


with the hope of improving the exposition. 
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Errors 


Although all known errors from the first edition have been corrected, there are likely 
to be some remaining undetected errors, and, in spite of the author’s best effort, 
there are likely to be some errors in the new sections and revisions of older material 
that were written for the second edition. If the reader finds any such errors—which 
will hopefully be few in number—it would be very helpful if you would send them 
to the author at bloch@bard.edu. An updated list of errors is available at http: 
//math.bard.edu/bloch/proofs2_errata. pdf. 
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Preface to the First Edition 


In an effort to make advanced mathematics accessible to a wide variety of students, 
and to give even the most mathematically inclined students a solid basis upon which 
to build their continuing study of mathematics, there has been a tendency in recent 
years to introduce students to the formulation and writing of rigorous mathematical 
proofs, and to teach topics such as sets, functions, relations and countability, in a 
“transition” course, rather than in traditional courses such as linear algebra. A tran- 
sition course functions as a bridge between computational courses such as calculus, 
and more theoretical courses such as linear algebra and abstract algebra. 

This text contains core topics that the author believes any transition course should 
cover, as well as some optional material intended to give the instructor some flexi- 
bility in designing a course. The presentation is straightforward and focuses on the 
essentials, without being too elementary, too excessively pedagogical, and too full of 
distractions. 

Some of the features of this text are the following: 


(1) Symbolic logic and the use of logical notation are kept to a minimum. We 
discuss only what is absolutely necessary—as is the case in most advanced 
mathematics courses that are not focused on logic per se. 


(2) We distinguish between truly general techniques (for example, direct proof 
and proof by contradiction) and specialized techniques, such as mathematical 
induction, which are particular mathematical tools rather than general proof 
techniques. 


(3) We avoid an overemphasis on “fun” topics such as number theory, combi- 
natorics or computer science-related topics, because they are not as central 
as a thorough treatment of sets, functions and relations for core mathemat- 
ics courses such as linear algebra, abstract algebra and real analysis. Even 
the two sections on combinatorics in Chapter 7 were written with a focus on 
reinforcing the use of sets, functions and relations, rather than emphasizing 
clever counting arguments. 
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(4) The material is presented in the way that mathematicians actually use it rather 
than in the most axiomatically direct way. For example, a function is a special 
type of a relation, and from a strictly axiomatic point of view, it would make 
sense to treat relations first, and then develop functions as a special case of 
relations. Most mathematicians do not think of functions in this way (except 
perhaps for some combinatorialists), and we cover functions before relations, 
offering clearer treatments of each topic. 


(5) A section devoted to the proper writing of mathematics has been included, to 
help remind students and instructors of the importance of good writing. 


Outline of the text 


The book is divided into three parts: Proofs, Fundamentals and Extras. At the end of 
the book is a brief Appendix summarizing a few basic properties of the real numbers, 
an index and a bibliography. The core material in this text, which should be included 
in any course, consists of Parts I and II (Chapters 1-6). A one-semester course can 
comfortably include all the core material, together with a small amount of material 
from Part III, chosen according to the taste of the instructor. 

Part I, Proofs, consists of Chapters 1 and 2, covering informal logic and proof 
techniques, respectively. These two chapters discuss the “how” of modern mathe- 
matics, that is, the methodology of rigorous proofs as is currently practiced by math- 
ematicians. Chapter | is a precursor to rigorous proofs, and is not about mathematical 
proofs per se. The exercises in this chapter are all informal, in contrast to the rest of 
the book. Chapter 2, while including some real proofs, also has a good bit of informal 
discussion. 

Part II, Fundamentals, consists of Chapters 3-6, covering sets, functions, rela- 
tions and cardinality, respectively. This material is basic to all of modern mathemat- 
ics. In contrast to Part I, this material is written in a more straightforward defini- 
tion/theorem/proof style, as is found in most contemporary advanced mathematics 
texts. 

Part II, Extras, consists of Chapters 7 and 8, and has brief treatments of a variety 
of topics, including groups, homomorphisms, partially ordered sets, lattices, combi- 
natorics and sequences, and concludes with additional topics for exploration by the 
reader, as well as a collection of attempted proofs (actually submitted by students) 
which the reader should critique as if she were the professor. 

Some instructors might choose to skip Section 4.5 and Section 6.4, the former be- 
cause it is very abstract, and the latter because it is viewed as not necessary. Though 
skipping either or both of these two sections is certainly plausible, instructors are 
urged to consider not to do so. Section 4.5 is intended to help students prepare for 
dealing with sets of linear maps in linear algebra, and comparable constructions in 
other branches of mathematics. Section 6.4 is a topic that is often skipped over in the 
mathematical education of many undergraduates, and that is unfortunate, because 


XVI Preface to the First Edition 


it prevents the all too common (though incorrect) attempt to define sequences “by 
induction.” 
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To the Student 


This book is designed to bridge the large conceptual gap between computational 
courses such as calculus, usually taken by first- and second-year college students, 
and more theoretical courses such as linear algebra, abstract algebra and real anal- 
ysis, which feature rigorous definitions and proofs of a type not usually found in 
calculus and lower-level courses. The material in this text was chosen because it is, 
in the author’s experience, what students need to be ready for advanced mathematics 
courses. The material is also worth studying in its own right, by anyone who wishes 
to get a feel for how contemporary mathematicians do mathematics. 

Though we emphasize proofs in this book, serious mathematics is—contrary to 
a popular misconception—not “about” proofs and logic any more than serious lit- 
erature is “about” grammar, or music is “about” notes. Mathematics is the study of 
some fascinating ideas and insights concerning such topics as numbers, geometry, 
counting and the like. Ultimately, intuition and imagination are as valuable in math- 
ematics as rigor. Both mathematical intuition and facility with writing proofs can 
be developed with practice, just as artists and musicians develop their creative skills 
through training and practice. 

Mathematicians construct valid proofs to verify that their intuitive ideas are cor- 
rect. How can you be sure, for example, that the famous Pythagorean Theorem is 
true? There are infinitely many possible triangles, so no one can check whether the 
Pythagorean Theorem holds for all triangles by checking each possible triangle di- 
rectly. As you learn more abstract mathematical subjects, it will be even harder to be 
sure whether certain ideas that seem right intuitively are indeed correct. Hence we 
need to adhere to accepted standards of rigor. 

There are two foci in this text: proofs and fundamentals. Just as writing a novel 
ultimately relies upon the imagination, but needs a good command of grammar, as 
well as an understanding of the basics of fiction such as plot and character, so too 
for mathematics. Our “grammar” is logic and proof techniques; our “basics” are sets, 
functions, relations and so on. You will have to add your own imagination to the mix. 
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Prerequisites 


A course that uses this text would generally have as a prerequisite a standard calculus 
sequence, or at least one solid semester of calculus. In fact, the calculus prerequisite 
is used only to insure a certain level of “mathematical maturity,’ which means suf- 
ficient experience—and comfort — with mathematics and mathematical thinking. 
Calculus per se is not used in this text (other than an occasional reference to it in 
the exercises); neither is there much of pre-calculus. We do use standard facts about 
numbers (the natural numbers, the integers, the rational numbers and the real num- 
bers) with which the reader is certainly familiar. See the Appendix for a brief list of 
some of the standard properties of real numbers that we use. On a few occasions we 
will give an example with matrices, though such examples can easily be skipped. 


Exercises 


Similarly to music and art, mathematics is learned by doing, not just by reading texts 
and listening to lectures. Doing the exercises in this text is the best way to get a 
feel for the material, to see what you understand, and to identify what needs further 
study. Exercises range from routine examples to rather tricky proofs. The exercises 
have been arranged in order so that in the course of working on an exercise, you 
may use any previous theorem or exercise (whether or not you did it), but not any 
subsequent result (unless stated otherwise). Some exercises are used in the text, and 
are so labeled. 


Writing Mathematics 


It is impossible to separate rigor in mathematics from the proper writing of proofs. 
Proper writing is necessary to maintain the logical flow of an argument, to keep 
quantifiers straight, and more. The reader would surely not turn in a literature paper 
written without proper grammar, punctuation and literary usage, and no such paper 
would be accepted by a serious instructor of literature. Please approach mathematics 
with the same attitude. (Proper writing of mathematics may not have been empha- 
sized in your previous mathematics courses, but as you now start learning advanced 
mathematics, you may have to adjust your approach to doing mathematics.) 

In particular, mathematicians write formal proofs in proper English (or whatever 
language they speak), with complete sentences and correct grammar. Even mathe- 
matical symbols are included in sentences. Two-column proofs, of the type used in 
some high school geometry classes, are not used in advanced mathematics (except 
for certain aspects of logic). So, beginning with Chapter 2, you should forget two- 
column proofs, and stick to proper English. In Chapter 1 we will be doing preparatory 
work, so we will be less concerned with proper writing there. 
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Mathematical Notation and Terminology 


Just as mathematics is not “about” proofs and logic (as mentioned above), so too 
mathematics is not “about” obscure terminology and symbols. Mathematical ter- 
minology and symbols (such as Greek letters) are simply shorthand for otherwise 
cumbersome expressions. For example, it is much easier to solve the equation 
3x-+5 = 7 — 6x written in symbols than it is to solve the equation given by the phrase 
“the sum of three times an unknown number and the number five equals the differ- 
ence between the number seven and six times the unknown number.” If we wrote out 
all of mathematics without symbols or specialized terminology, we would drown in 
a sea of words, and we would be distracted from the essential mathematical ideas. 
On the other hand, whereas the use of mathematical symbols is of great convenience, 
it is important to keep in mind at all times that mathematics is not the mere manipu- 
lation of symbols—every symbol means something, and it is that meaning in which 
we are ultimately interested. 

There is no central authority that determines mathematical notation, and varia- 
tions exist in the literature for the notation for some fundamental mathematical con- 
cepts; in this text we have adopted the most commonly used notation as much as 
possible. It should be noted that mathematical notation has evolved over time, and 
care is needed when studying older books and papers. 

To help with readability, we have added a few symbols that are analogs of the 
very useful (and widely used) end-of-proof symbol, which is 0. This symbol lets the 
reader know when a proof is done, signaling that the end is in sight, and allowing 
a proof to be skipped upon first reading. Mathematics texts are rarely read straight 
from beginning to end, but are gone over back and forth in whatever path the reader 
finds most helpful. In this book we decided to take a good thing and make it better, 
adding the symbol A for the end of a definition, the symbol ¢ for the end of an 
example, and the symbol /// for the end of scratch work or other non-proofs. The 
point of all these symbols is to separate formal mathematical writing, namely, proofs, 
definitions and the like, from the informal discussion between the formal writing. 

An important point to note concerning mathematical terminology is that whereas 
some names are invented specifically for mathematical use (for example the word 
“injective”), other mathematical terms are borrowed from colloquial English. For 
example, the words “group,” “orbit” and “relation” all have technical meanings in 
mathematics. It is important to keep in mind, however, that the mathematical usage 
of these words is not the same as their colloquial usage. Even the seemingly simple 
word “or” has a different mathematical meaning than it does colloquially. 


What This Text Is Not 


Mathematics as an intellectual endeavor has an interesting history, starting in such 
ancient civilizations such as Egypt, Greece, Babylonia, India and China, progressing 
through the Middle Ages (especially in the non-Western world), and accelerating up 
until the present time. The greatest mathematicians of all time, such as Archimedes, 
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Newton and Gauss, have had no less of an impact on human civilization than their 
non-mathematical counterparts such as Plato, Buddha, Shakespeare and Beethoven. 
Unbeknownst to many non-mathematicians, mathematical research is thriving today, 
with more active mathematicians and more published papers than in any previous 
era. For lack of space, we will not be discussing the fascinating history of mathe- 
matics in this text. See [Boy91], [Str87] or [Ang94] for a treatment of the history of 
mathematics. 

The study of mathematics raises some very important philosophical questions. 
Do mathematical objects exist? Do we discover mathematics or invent it? Is math- 
ematics universal, or a product of specific cultures? What assumptions about logic 
(for example, the Law of the Excluded Middle) should we make? Should set the- 
ory form the basis of mathematics, as is standard at present? We will not be dis- 
cussing these, and other, philosophical questions in this text, not because they are 
not important, but because it would be a diversion from our goal of treating certain 
fundamental mathematical topics. Mathematicians tend, with some exceptions, to be 
only minimally reflective about the philosophical underpinnings of their mathemat- 
ical activity; for better or worse, this book shares that approach. There is so much 
interesting mathematics to do that most mathematicians—who do mathematics for 
the joy of it—would rather spend their time doing mathematics than worrying about 
philosophical questions. 

The majority of mathematicians are fundamentally closet Platonists, who view 
mathematical objects as existing in some idealized sense, similar to Platonic forms. 
Our job, as we view it, is to discover what we can about these mathematical objects, 
and we are happy to use whatever valid tools we can, including philosophically con- 
troversial notions such as the Law of the Excluded Middle (see Section 1.2 for further 
discussion). Philosophers of mathematics, and those mathematicians prone to philos- 
ophizing, can be somewhat frustrated by the unwillingness of most mathematicians 
to deviate from the standard ways in which mathematics is done; most mathemati- 
cians, seeing how well mathematics works, and how many interesting things can be 
proved, see no reason to abandon a ship that appears (perhaps deceptively) to be very 
sturdy. In this text we take the mainstream approach, and we do mathematics as it is 
commonly practiced today (though we mention a few places where other approaches 
might be taken). For further discussion of philosophical issues related to mathemat- 
ics, a good place to start is [DHM95] or [Her97]; see also [GG94, Section 5.9]. For a 
succinct and entertaining critique of the standard approach to doing mathematics as 
described in texts such as the present one, see [Pou99]. 


To the Instructor 


There is an opposing set of pedagogical imperatives when teaching a transition 
course of the kind for which this text is designed: On the one hand, students often 
need assistance making the transition from computational mathematics to abstract 
mathematics, and as such it is important not to jump straight into water that is too 
deep. On the other hand, the only way to learn to write rigorous proofs is to write 
rigorous proofs; shielding students from rigor of the type mathematicians use will 
only ensure that they will not learn how to do mathematics properly. 

To resolve this tension, a transition course should simultaneously maintain high 
standards in content, rigor and in writing, both by the instructor and by the students, 
while also giving the students a lot of individual attention and feedback. Watering 
down the core content of a transition course, choosing “fun” topics instead of central 
ones, making the material easier than it really is, or spending too much time on 
clever pedagogical devices instead of core mathematics, will allow students to have 
an easier time passing the course, but will result in students who are not ready to 
take more advanced mathematics courses—which is the whole point of the transition 
course. 

When teaching students to write proofs, there is no substitute for regularly as- 
signed homework problems, and for regular, and detailed, feedback on the homework 
assignments. Students can learn from their mistakes only if the mistakes are pointed 
out, and if better approaches are suggested. Having students present their proofs to 
the class is an additional forum for helpful feedback. 

Most mathematicians of the author’s generation never had a transition course, 
and simply picked up the techniques of writing proofs, and the basics of such fun- 
damental topics as sets and functions, while they were taking courses such as linear 
algebra and abstract algebra. However, what worked for those who went on to be- 
come professors of mathematics does not always work for all students, and extra 
effort is needed to guide students until the basic idea of what constitutes a proof has 
sunk in. Hence, a dedicated focus on the formulation and writing of proofs, attention 
to the details of student work, and supportive guidance during this learning process 
are all very helpful to students as they make the transition to advanced mathematics. 
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One place where too much indulgence is given, however, even in more advanced 
mathematics courses, and where such indulgence is, the author believes, quite mis- 
guided, involves the proper and careful writing of proofs. Seasoned mathematicians 
make honest mathematical errors all the time (as we should point out to our students), 
and we should certainly understand such errors by our students. By contrast, there is 
simply no excuse for sloppiness in writing proofs, whether the sloppiness is physical 
(hastily written first drafts of proofs handed in rather than neatly written final drafts) 
or in the writing style (incorrect grammar, undefined symbols, etc.). Physical sloppi- 
ness is often a sign of either laziness or disrespect, and sloppiness in writing style is 
often a mask for sloppy thinking. 

The elements of writing mathematics are discussed in detail in Section 2.6. It 
is suggested that these notions be used in any course taught with this book (though 
of course it is possible to teach the material in this text without paying attention to 
proper writing). The author has heard the argument that students in an introductory 
course are simply not ready for an emphasis on the proper writing of mathematics, 
but his experience teaching says otherwise: not only are students ready and able 
to write carefully no matter what their mathematical sophistication, but they gain 
much from the experience because careful writing helps enforce careful thinking. 
Of course, students will only learn to write carefully if their instructor stresses the 
importance of writing by word and example, and if their homework assignments and 
tests include comments on writing as well as mathematical substance. 


Part I 
PROOFS 


Mathematics, like other human endeavors, has both a “what” and a 
“how.” The “what” is the subject matter of mathematics, ranging from 
numbers to geometry to calculus and beyond. The “how” depends upon 
who is doing the mathematics. At the elementary school level, we deal 
with everything very concretely. At the high school level, when we learn 
algebra and geometry, things get more abstract. We prove some things, 
for example in geometry, and do others computationally, for example al- 
gebra. To a mathematician, by contrast, there is no split between how we 
do algebra and how we do geometry: everything is developed axiomat- 
ically, and all facts are proved rigorously. The methodology of rigorous 
proofs done the contemporary way—dquite different from the two-column 
proofs sometimes used in high school geometry—is the “how” of math- 
ematics, and is the subject of this part of the text. In Chapter 1 we give a 
brief treatment of informal logic, the minimum needed to construct sound 
proofs. This chapter is much more informal than the rest of the book, and 
should not be taken as a sign of things to come. In Chapter 2 we discuss 
mathematical proofs, and the various approaches to constructing them. 
Both of these chapters have a good bit of informal discussion, in contrast 
to some later parts of the book. 


1 


Informal Logic 


Logic is the hygiene the mathematician practices to keep his ideas healthy 
and strong. 
— Hermann Wey] (1885-1955) 


1.1 Introduction 


Logic is the framework upon which rigorous proofs are built. Without some basic 
logical concepts, which we will study in this chapter, it would not be possible to 
structure proofs properly. It will suffice for our purposes to approach these logical 
concepts informally (and briefly). Though logic is the foundation of mathematical 
reasoning, it is important not to overemphasize the use of formal logic in mathemat- 
ics. Outside of the field of mathematical logic, proofs in mathematics almost never 
involve formal logic, nor do they generally involve logical symbols (although we will 
need such symbols in the present chapter). 

Logic is an ancient subject, going back in the West to thinkers such as Aristotle, 
as well as to ancient non-Western thinkers. Having originated as an analysis of valid 
argumentation, logic is strongly linked to philosophy. Mathematicians have devel- 
oped a mathematical approach to logic, although there is no rigid boundary between 
the study of logic by mathematicians and by philosophers; indeed, some logicians 
have excelled in both fields. Some aspects of logic have taken on new importance 
in recent years with the advent of computers, because logical ideas are at the basis 
of some aspects of computer science. For more about traditional logic, see [Cop68], 
which is very readable, and [KMM80], which is more formal. For mathematical 
logic, see [End72], [Mal79] or [EFT94]. See the introduction to Chapter | of the last 
of these books for a discussion of the relation of mathematical logic to traditional 
logic. For an interesting discussion of logic, see [EC89, Chapters 19 and 20]. For a 
treatment of logic in the context of computer science, see [DS W94, Part 3]. 

Although the informal logic we discuss in this chapter provides the underpinning 
for rigorous proofs, informal logic is not in itself rigorous. Hence the present chapter 
is substantially different from the rest of the book in that it is entirely informal. 
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Because we start discussing mathematical proofs only in the next chapter, for now 
our discussion is not written in the style appropriate for rigorous proofs. The same 
goes for the homework exercises in this chapter. 

In this chapter, and throughout this text, we will use the basic properties of the 
integers, rational numbers and real numbers in some of our examples. We will as- 
sume that the reader is informally familiar with these numbers. The basic properties 
of the natural numbers will be discussed briefly in Section 6.2. See the Appendix for 
a brief list of some of the standard properties of real numbers; see [Blo11, Chapters | 
and 2] for a detailed treatment of the standard number systems. 

The aspect of mathematics we are learning about in this text is to state results, 
such as theorems, and then prove them. Of course, a great deal of intuition, informal 
exploration, calculation and grunt work goes into figuring out what to try to prove, 
but that is another matter. Logic, at its most basic, is concerned with the construc- 
tion of well-formed statements and valid arguments; these two notions will form the 
logical framework for the proper stating and proving of theorems. The actual math- 
ematics of doing proofs will have to wait until Chapter 2. 


1.2 Statements 


When we prove theorems in mathematics, we are demonstrating the truth of certain 
statements. We therefore need to start our discussion of logic with a look at state- 
ments, and at how we recognize certain statements as true or false. A statement is 
anything we can say, write or otherwise express that is either true or false. For ex- 
ample, the expression “Fred Smith is twenty years old” is a statement, because it 
is either true or false. We might not know whether this statement is actually true or 
not, because to know that would require that we know some information about Fred 
Smith, for example his date of birth, and that information might not be available to 
us. For something to be a statement, it has to be either true or false in principle; it 
does not matter whether we personally can verify its truth or falsity. By contrast, the 
expression “Eat a pineapple” is not a statement, because it cannot be said to be either 
true or false. 

It is important to distinguish between English expressions that we might say, and 
the statements they make. For example, when we wrote “Fred Smith is twenty years 
old,” we could just as well have written “Fred Smith’s age is twenty.” These two 
English expressions are not identical because they do not have the exact same words, 
but they certainly make the same statement. For the sake of convenience, we will 
refer to expressions such as “‘Fred Smith is twenty years old” as statements, though 
we should realize that we are really referring to the statement that the expression is 
making. In practice, there should not be any confusion on this point. 

We will be making two assumptions when dealing with statements: every state- 
ment is either true or false, and no statement is both true and false. The first of these 
assumptions, often referred to as the Law of the Excluded Middle (and known for- 
mally as bivalence), may seem innocuous enough, but in fact some mathematicians 
have chosen to work without this powerful axiom. The majority of mathematicians 
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do use the Law of the Excluded Middle (the author of this book among them), and we 
will not hesitate to use it implicitly throughout this book. One of the consequences 
of this law is that if a statement is not false, then it must be true. Hence, to prove that 
something is true, it would suffice to prove that it is not false; this strategy is very 
useful in some proofs. Mathematicians who do not accept the Law of the Excluded 
Middle would not consider as valid any proof that uses the law (though the incorrect- 
ness of a proof does not necessitate the falsity of the statement being proved, only 
that another proof has to be sought). See [Wil65, Chapter 10] or [Cop68, Section 8.7] 
for more discussion of these issues. 

If the only thing we could do with statements is to decide whether something 
is a statement or not, the whole concept would be fairly uninteresting. What makes 
statements more valuable for our purposes is that there are a number of useful ways of 
forming new statements out of old ones. An analog to this would be the ways we have 
of combining numbers to get new ones, such as addition and multiplication; if we did 
not have these operations, then numbers would not be very interesting. In this section 
we will discuss five ways of forming new statements out of old ones, corresponding 
to the English expressions: and; or; not; if, then; if and only if. The statements out of 
which we form a new one will at times be referred to as the component statements 
of the new statement. 

For our definitions of these five constructions, we let P and Q be statements. 

Our first construction, the conjunction of P and Q, which is denoted P A Q, is 
the statement that, intuitively, is true if both P and Q are true, and is false otherwise. 
We read P A Q as “P and Q.” The precise definition of P A Q is given by the “truth 
table” 


This truth table, and all others like it, shows whether the new statement (in this case 
PA Q) is true or false for each possible combination of the truth or falsity of each of 
P and Q. 

As an example of conjunction, let P = “it is raining today,’ and let Q = “‘it is 
cold today.” The statement P \ Q would formally be “it is raining today and it is cold 
today.” Of course, we could express the same idea more succinctly in English by 
saying “it is raining and cold today.” In general, we will try to use statements that 
read well in English, as well as being logically correct. 

The colloquial use of the word “and” differs from the mathematical usage stated 
above. The mathematical usage means the above truth table, and nothing else, while 
colloquially there are other meanings in addition to this one. One source of confusion 
involving the word “and” that is well worth avoiding is the colloquial use of this word 
in the sense of “therefore.” For example, it is not uncommon to find a sentence such 
as “From the previous equation we see that 3x < 6, and x < 2.” What is really meant 
by this sentence is “From the previous equation we see that 3x < 6, which implies 
that x < 2.” Such a use of “and” to mean “therefore” is virtually never necessary, and 
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because it can lead to possible confusion, it is best avoided. It would be fine to say 
“From the previous equation we see that 3x < 6, and x < 2,” because in that case the 
“and” is functioning only as the conjunction between the two parts of the sentence, 
and is not a substitute for the word “therefore.” 

Another colloquial use of “and” that differs from mathematical usage, though 
one that is less likely to cause us problems here, is seen in the statement “Fred and 
Susan are married.” Interpreted in the strict mathematical sense, we could only con- 
clude from this statement that each of Fred and Susan is married, possibly to different 
people. In colloquial usage, by contrast, this statement would almost always be inter- 
preted as meaning that Fred and Susan are married to each other. In literary writing, 
some measure of ambiguity, or some implied meaning that is not stated explicitly, is 
often valuable. In mathematics, on the other hand, precision is key, and ambiguity is 
to be avoided at all costs. When using a mathematical term, always stick to the pre- 
cise mathematical definition, regardless of any other colloquial usage. For example, 
in mathematical writing, if we wanted to indicate that Fred and Susan are married 
to each other, we should state explicitly “Fred and Susan are married to each other,” 
and if we want to state only that each of Fred and Susan is married, we should say 
“Fred is married and Susan is married.” 

Our second construction, the disjunction of P and Q, which is denoted P V Q, is 
the statement that, intuitively, is true if either P is true or Q is true or both are true, 
and is false otherwise. We read P V Q as “P or Q.” The precise definition of P V Q is 
given by the truth table 


The truth of the statement P V Q means that at least one of P or Q is true. Though 
we write PV Q in English as “P or Q,” it is very important to distinguish the mathe- 
matical use of the word “or” from the colloquial use of the word. The mathematical 
use of the word “or” always means an inclusive “or,” so that if “P or Q” is true, then 
either P is true, or Q is true, or both P and Q are true. By contrast, the colloquial 
use of the word “or” often means an exclusive “or,” which does not allow for both P 
and Q to be true. In this text, as in all mathematical works, we will always mean an 
inclusive “or,” as given in the truth table above. 

A simple example of a disjunction is the statement “my car is red or it will rain 
today.” This statement has the form P V Q, where P = “my car is red,” and Q = “it 
will rain today.” The truth of this statement implies that at least one of the statements 
“my car is red” or “it will rain today” is true. The only thing not allowed is that both 
“my car is red” and “it will rain today” are false. 

Now consider the statement “tonight I will see a play or I will see a movie.” In 
colloquial usage it would be common to interpret this statement as an exclusive or, 
meaning that either I will see a play, or I will see a movie, but not both. In colloquial 
usage, if I wanted to include the possibility that I might see both a play and a movie, I 
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would likely say “tonight I will see a play, or I will see a movie, or both.” By contrast, 
in mathematical usage the statement “tonight I will see a play or I will see a movie” 
would always be interpreted as meaning that either I will see a play, or I will see a 
movie, or both. In mathematical usage, if I wanted to exclude the possibility that I 
might see both a play and a movie, I would say “tonight I will see a play or I will see 
a movie, but not both.” 

One other source of confusion involving the word “or” that is well worth avoiding 
is the colloquial use of this word in the sense of “that is’ Consider the colloquial 
sentence “when I was in France I enjoyed eating the local fromage, or, cheese.” 
What is really meant is “when I was in France, I enjoyed eating the local fromage, 
that is, cheese.” Such a use of “or” is best avoided in mathematical writing, because 
it is virtually never necessary, and can lead to confusion. 

Our third construction, the negation of P, which is denoted —P, is the statement 
that, intuitively, is true if P is false, and is false if P is true. We read —P as “not P.” 
The precise definition of —P is given in the truth table 


P|=P 


TF 
FIT. 


Let P = “Susan likes mushy bananas.” It would not work in English to write —P 
as “Not Susan likes mushy bananas,” both because that is not proper English, and 
because it appears as if the subject of the sentence is someone named “Not Susan.” 
The most straightforward way of negating P is to write —~P = “it is not the case 
that Susan likes mushy bananas.” While formally correct, this last statement is quite 
awkward to read, and it is preferable to replace it with an easier-to-read expression, 
for example “Susan does not like mushy bananas.” 

Our final two ways of combining statements, both of which are connected to the 
idea of logical implication, are slightly more subtle than what we have seen so far. 
Consider the statement “If Fred goes on vacation, he will read a book.” What would 
it mean to say that this statement is true? It would not mean that Fred is going on 
vacation, nor would it mean that Fred will read a book. The truth of this statement 
means only that if one thing happens (namely, Fred goes on vacation), then another 
thing will happen (namely, Fred reads a book). In other words, the one way in which 
this statement would be false would be if Fred goes on vacation, but does not read 
a book. The truth of this statement would not say anything about whether Fred will 
or will not go on vacation, nor would it say anything about what will happen if Fred 
does not go on vacation. In particular, if Fred did not go on vacation, then it would 
not contradict this statement if Fred read a book nonetheless. 

Now consider the statement “If grass is green, then Paris is in France.” Is this 
statement true? In colloquial usage, this statement would seem strange, because there 
does not seem any inherent connection, not to mention causality, between the first 
part of the sentence and the second. In mathematical usage, however, we want to be 
able to decide whether a statement of any form is true simply by knowing the truth or 
falsity of each of its component statements, without having to assess something more 
vague such as causality. For example, the statement “Cows make milk and cars make 
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noise” is certainly true, even though the two parts of the sentence are not inherently 
connected. Similarly, the statement “If grass is green, then Paris is in France” also 
ought to be decidable as true or false depending only upon whether “grass is green” 
and “Paris is in France” are each true or false. As in the previous paragraph, we take 
the approach that a statement of the form “if P then Q” should be true if it is not 
the case that P is true and Q is false. Therefore, because grass is indeed green and 
Paris is indeed in France, the statement “If grass is green, then Paris is in France” is 
true. This approach to the notion of “if ... then ...” is somewhat different from the 
colloquial use of the term, just as our uses of “and” and “or” were not the same as 
their colloquial uses. We formalize this approach as follows. 

Our fourth construction, the conditional from P to Q, which is denoted P — Q, 
is the statement that, intuitively, is true if it is never the case that P is true and Q is 
false. We read P — Q as “if P then Q.” The precise definition of P — Q is given in 
the truth table 


The first two rows of the truth table are fairly reasonable intuitively. If P is true 
and Q is true, then certainly P — Q should be true; if P is true and Q is false, then 
P — Q should be false. The third and fourth rows of the truth table, which say that 
the statement P — Q is true whenever P is false, regardless of the value of Q, are less 
intuitively obvious. There is, however, no other plausible way to fill in these rows, 
given that we want the entries in the truth table to depend only on the truth or falsity 
of P and Q, and that the one situation with which we are primarily concerned is that 
we do not want P to be true and Q to be false. Moreover, if we were to make the 
value of P — Q false in the third and fourth rows, we would obtain a truth table that 
is identical to the truth table for P A Q, which would make P — Q redundant. The 
above truth table for P A Q, which is universally accepted by mathematicians and 
logicians, may seem strange at first glance, and perhaps even contrary to intuition, 
but it is important to get used to it, because we will always use P — Q as we have 
defined it. 

A simple example of a conditional statement is “if it rains today, then I will see a 
movie this evening.” This statement has the form P — Q, where P = “‘it rains today,” 
and Q = “T will see a movie this evening.” The truth of this statement does not say 
that it is raining today, nor that I will see a movie this evening. It only says what will 
happen if it rains today, which is that I will see a movie this evening. If it does not 
rain, I still might see a movie this evening, or I might not; both of these possibilities 
would be consistent with the truth of the original statement “if it rains today, then I 
will see a movie this evening.” 

Although it is standard to write P — Q, it is not the order of writing that counts, 
but the logical relationship. It would be identical to write Q < P instead of P — Q. 
Either way, each of P and @Q has a specified—and distinct—role. By contrast, if we 
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write Q — P, then we have switched the roles of Q and P, resulting in a statement 
that is not equivalent to P — Q (as will be discussed in Section 1.3). 

There are a number of variations as to how to write the statement P — Q in 
English. In addition to writing “if P then Q,” we could just as well write any of the 
following: 


If P, Q; 

Q if P; 

P only if Q; 

Q provided that P; 
Assuming that P, then Q; 
Q given that P; 

P is sufficient for Q; 

Q is necessary for P. 


These variants are each useful in particular situations. For example, the statement 
“if it rains today, then I will see a movie this evening” could just as well be written 
“T will see a movie this evening if it rains today.” It would also be formally correct to 
say “it is raining today is sufficient for me to see a movie this evening,” though such 
a sentence would, of course, be rather awkward. 

Our fifth construction, the biconditional from P to Q, which is denoted P — Q, 
is the statement that, intuitively, is true if P and Q are both true or both false, and is 
false otherwise. We read P > Q as “P if and only if Q.” The phrase “if and only if” is 
often abbreviated as “iff’”’ The precise definition of P «> Q is given in the truth table 


An example of a biconditional statement is “I will go for a walk if and only if 
Fred will join me.” This statement has the form P «> Q, where P = “I will go for 
a walk,” and Q = “Fred will join me.” The truth of this statement does not say that 
I will go for a walk, or that Fred will join me. It says that either Fred will join me 
and I will go for a walk, or that neither of these things will happen. In other words, 
it could not be the case that Fred joins me and yet I do not go for a walk, and it also 
could not be the case that I go for a walk, and yet Fred has not joined me. 

There are some variations as to how to write the statement P ~ Q in English. In 
addition to writing “P if and only if Q,” it is common to write “P is necessary and 
sufficient for Q.” 

In Section 1.3 we will clarify further the meaning of biconditional statements. 
Among other things, we will see that the order of writing a biconditional statement 
makes no difference, that is, it makes no difference whether we write P <— Q or 
QP. 

Now that we have defined our five basic ways of combining statements, we can 
form more complicated compound statements by using combinations of the basic 
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operations. For example, we can form P V (Q — —R) out of statements P, Q and R. 
We need to use parentheses in this compound statement, to make sure it is unam- 
biguous. We use the standard convention that — takes precedence over the other four 
operations, but none of these four takes precedence over the others. Hence, writing 
“P\/ Q — —R” would be ambiguous, and we would never write such an expression. 

We can form the truth table for the statement P V (Q — —R), doing one operation 
at a time, as follows: 


P|O\|R|=R|Q > -R|PV (OQ > -R) 
TIT|T|F| F f 
TIT|F|T| T T 
TIFIT|F| T 7 
TIFIF|T| T T 
FIT|T|F | F F 
FIT\|F|\T| T T 
FIF|IT|F| T T 
FIF\|F|\T| T T 


To save time and effort, it is possible to write a smaller truth table with the same 
information as the truth table above, by writing one column at a time, and labeling 
the columns in the order of how we write them. In the truth table shown below, we 
first write columns | and 2, which are just copies of the P and Q columns; we then 
write column 3, which is the negation of the R column; column 4 is formed from 
columns 2 and 3, and column 5 is formed from columns | and 4. We put the label 
“5” in a box, to highlight that its column is the final result of the truth table, and 
refers to the compound statement in which we are interested. It is, of course, the 
same result as in the previous truth table. 


P|Q|RIP_ Vv (Q > -R) 
TTT Te F 
rier eT 7 
TIF\T|T T FT F 
TIF\|F|T T FT T 
F|T|T|F F T F F 
FIT|FIF T T T T 
FIF|TIF T FT F 
F|IF|FIF T F T T 

1|5) 24 a 


Just as we can form compound statements written with symbols, we can also 
form such statements written in English. The role that parentheses play in avoiding 
ambiguity in statements written with symbols is often played in English sentences 
by punctuation. For example, the sentence “I like to eat apples or pears, and I like 
to eat peaches” is unambiguous. If we let A = “T like to eat apples,” let B = “I like 
to eat pears” and let C = “T like to eat peaches,” then the sentence can be written in 
symbols as (A VB) AC. On the other hand, suppose that we were given the statement 
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(AV B) AC, and were told to translate it into English, knowing that A = “I like to eat 
apples,” etc., but without knowing that the statement had originally been formulated 
in English. A careful translation into English might result in the original statement, 
or in some equally valid variant, such as “TI like to eat apples or I like to eat pears, 
and I like to eat peaches.” Unfortunately, imprecise translations such as “I like to eat 
apples or pears and peaches,” or “J like to eat apples, or I like to eat pears, and I like to 
eat peaches,” are often made. These two statements are ambiguous; the ambiguity in 
the first statement results from the lack of necessary punctuation, and the ambiguity 
in the second statement results from incorrect punctuation. In both these statements 
the problem with the punctuation is not a matter of grammar, but rather of capturing 
accurately and unambiguously the meaning of the statement (A V B) AC. 

We end this section with a brief mention of two important concepts. A tautology 
is a Statement that is always true by logical necessity, regardless of whether the com- 
ponent statements are true or false, and regardless of what we happen to observe in 
the real world. A contradiction is a statement that is always false by logical neces- 
sity. Most statements we encounter will be neither of these types. For example, the 
statement “Irene has red hair” is neither a tautology nor a contradiction, because it 
is not necessarily either true or false—it is logically plausible that Irene does have 
red hair, and it is just as plausible that she does not. Even the statement “1 4 2” is 
not a tautology. It is certainly true in our standard mathematical system, as far as we 
know, but the truth of this statement is an observation about the way human beings 
have constructed their number system, not a logical necessity. 

An example of a tautology is the statement “Irene has red hair or she does not 
have red hair.” It seems intuitively clear that this statement is a tautology, and we can 
verify this fact formally by using truth tables. Let P = “Irene has red hair.” Then our 
purported tautology is the statement P V —P. The truth table for this statement is 


P|P v =P 


T\|T T F 
F\F T T 
1|3) 2. 


We see in column 3 that the statement P V —P is always true, regardless of whether P 
is true or false. This fact tells us that P V —P is a tautology. In general, a statement is 
a tautology if, as verified using a truth table, it is always true, regardless of whether 
its component statements are true or false. 

The statement “Irene has red hair and she does not have red hair’ is a contradic- 
tion. In symbols this statement is P \ —P, and it has truth table 


P|P A =P 
T|T F F 
F|F F T 


1|3) 2, 


The statement P \ —P is always false, regardless of whether P is true or false. In 
general, a statement is a contradiction if, as verified using a truth table, it is always 
false, regardless of whether its component statements are true or false. 
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That PV —P is a tautology, and that PA —P is a contradiction, seems quite in- 
tuitively reasonable. It is possible, however, to have more complicated (and not so 
intuitive) tautologies and contradictions. For example, the truth table of the statement 


[(PAQ) > R| > [P> (QR) is 


i Py gy ay yg 
ee PAA AAA A> 


"TUDWDAANA!D 
TUS DAA 
RRR RAP TDAS 
NWAVHaayyes 
WANARNARANATDA 
BWHBMA WITS 
Sa 
CmMmR™DAHAAAANA 
RANA TA 
QAMUDYADIANNLS 
oR TANNA THA 
NPY PQ BDA 


— 
j=) 


We see in column 11 that the statement [(P \ Q) — R] > [P — (Q — R) is always 
true, regardless of whether each of P, Q and R is true or false. Hence the statement 
is a tautology. Suppose that P = “Sam is sad,’ let Q = “Warren is sad” and R = 
“Sam and Warren eat pasta.” Then the statement becomes “If it is true that if Sam 
and Warren are both sad then they eat pasta, then it is true that if Sam is sad, then if 
Warren is sad they eat pasta.” 

As an example of a contradiction, the reader can verify with a truth table that the 
statement [0 > (PA =Q)] AQ is always false. 


Exercises 


Exercise 1.2.1. Which of the following expressions are statements? 


(1) Today is a nice day. 

(2) Go to sleep. 

(3) Is it going to snow tomorrow? 

(4) The U.S. has 49 states. 

(5) I like to eat fruit, and you often think about traveling to Spain. 
(6) If we go out tonight, the babysitter will be unhappy. 

(7) Call me on Thursday if you are home. 


Exercise 1.2.2. Which of the following expressions are statements? 


(1) 4<3. (5) (a+b)? =a? +2ab +b’. 
(2) Ifx >2 then x? > 1. © @#+Pac’. 

(3) y<7. (7) If w = 3 then 2” £0. 

(4) x+y =z. 


Exercise 1.2.3. Let P = “I like fruit,’ let Q = “I do not like cereal” and R = “I know 
how to cook an omelette.” Translate the following statements into words. 


1.2 Statements 13 


(1) PAQ. (5) =PV7Q. 
(2) OVR. (6) =PVQ. 
(3) AR. (7) (RAP)VQ. 
(4) =(PVQ). (8) RA(PVQ). 


Exercise 1.2.4. Let X = “I am happy,” let Y = “I am watching a movie” and Z = “T 
am eating spaghetti.’ Translate the following statements into words. 


(1) ZX. (4) YV(Z—X). 
(2) XY. (5) (Y 3 7X) A(Z > 7X). 
(3) (YVZ)—X. (6) (XA7Y) & (YVZ). 


Exercise 1.2.5. Let X = “Fred has red hair,” let Y = “Fred has a big nose” and R = 
“Fred likes to eat figs.” Translate the following statements into symbols. 


(1) Fred does not like to eat figs. 

(2) Fred has red hair, and does not have a big nose. 

(3) Fred has red hair or he likes to eat figs. 

(4) Fred likes to eat figs, and he has red hair or he has a big nose. 

(5) Fred likes to eat figs and he has red hair, or he has a big nose. 

(6) It is not the case that Fred has a big nose or he has red hair. 

(7) It is not the case that Fred has a big nose, or he has red hair. 

(8) Fred has a big nose and red hair, or he has a big nose and likes to eat figs. 


Exercise 1.2.6. Let E = “The house is blue,” let F = “The house is 30 years old” 
and G = “The house is ugly.” Translate the following statements into symbols. 


(1) If the house is 30 years old, then it is ugly. 

(2) If the house is blue, then it is ugly or it is 30 years old. 

(3) If the house is blue then it is ugly, or it is 30 years old. 

(4) The house is not ugly if and only if it is 30 years old. 

(5) The house is 30 years old if it is blue, and it is not ugly if it is 30 years old. 

(6) For the house to be ugly, it is necessary and sufficient that it be ugly and 30 
years old. 


Exercise 1.2.7. Suppose that A is a true statement, that B is a false statement, that C 
is a false statement and that D is a true statement. Which of the following statements 
are true, and which are false? 


(1) AVC. (4) =-DvV-C. 
(2) (CAD) VB. (5) (DAA) V(BAC). 
(3) =(AAB). (6) CV[DV (AAB)]. 


Exercise 1.2.8. Suppose that X is a false statement, that Y is a true statement, that Z 
is a false statement and that W is a true statement. Which of the following statements 
are true, and which are false? 
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(1) Z-Y. (4) W > (X > -W). 
(2) XZ. (5) [((Y — W) CWI ARX. 
(3) YOWw)AX. (6) (W3X)—-(ZVY). 


Exercise 1.2.9. Suppose that Flora likes fruit, does not like carrots, likes nuts and 
does not like rutabagas. Which of the following statements are true, and which are 
false? 


(1) Flora likes fruit and carrots. 

(2) Flora likes nuts or rutabagas, and she does not like carrots. 

(3) Flora likes carrots, or she likes fruit and nuts. 

(4) Flora likes fruit or nuts, and she likes carrots or rutabagas. 

(5) Flora likes rutabagas, or she likes fruit and either carrots or rutabagas. 


Exercise 1.2.10. Suppose that Hector likes beans, does not like peas, does not like 
lentils and likes sunflower seeds. Which of the following statements are true, and 
which are false? 


(1) If Hector likes beans, then he likes lentils. 

(2) Hector likes lentils if and only if he likes peas. 

(3) Hector likes sunflower seeds, and if he likes lentils then he likes beans. 

(4) Hector likes peas and sunflower seeds if he likes beans. 

(5) If Hector likes lentils then he likes sunflower seeds, or Hector likes lentils if 
and only if he likes peas. 

(6) For Hector to like beans and lentils it is necessary and sufficient for him to 
like peas or sunflower seeds. 


Exercise 1.2.11. Make a truth table for each of the following statements. 


(1) PA-@. (4) (AVB)A(AVC). 
(2) (RVS) AAR. (5) (PAR) V-=A(QAS). 
(3) XV(=YVZ). 


Exercise 1.2.12. Make a truth table for each of the following statements. 


(ql) xX —-—Y. (4) (EOF) (EG). 
(2) (R>S)OR. (5) (P>R)V-~(Q<S). 
(3) =M > (NAL). 


Exercise 1.2.13. Which of the following statements are tautologies, which are con- 
tradictions and which are neither? 


(1) PV(APAQ). 

(2) (XVY) 4 (4X 3Y). 

(3) (AAAB)A (=A VB). 

(4) [ZV (AZVW)|A7(W AU). 

(5) [L— (M—>N)| > [M > (LN). 
(6) [(X GZ)A(X GY)IAX. 
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(7) (P< -Q)AP|AQ. 


Exercise 1.2.14. Which of the following statements are tautologies, which are con- 
tradictions and which are neither? 


(1) If John eats a blueberry pizza, then he either eats a blueberry pizza or he does 
not. 

(2) If John either eats a blueberry pizza or he does not, then he eats a blueberry 
pizza. 

(3) If pigs have wings and pigs do not have wings, then the sun sets in the east. 

(4) If Ethel goes to the movies then Agnes will eat a cake, and Agnes does not 
eat cake, and Ethel goes to the movies. 

(5) Rabbits eat cake or pie, and if rabbits eat pie then they eat cake. 

(6) The cow is green or the cow is not green, if and only if the goat is blue and 
the goat is not blue. 


Exercise 1.2.15. Let P be a statement, let T be a tautology and let C be a contradic- 
tion. 


(1) Show that PV T is a tautology. 
(2) Show that PAC is a contradiction. 


1.3 Relations Between Statements 


Up until now we have constructed statements; now we want to discuss relations be- 
tween them. Relations between statements are not formal statements in themselves, 
but are “meta-statements” that we make about statements. An example of a meta- 
statement is the observation that “if the statement ‘Ethel is tall and Agnes is short’ 
is true, then the statement ‘Ethel is tall’ is true.’ Another example is “the statement 
‘Irving has brown hair or Mel has red hair’ being true is equivalent to the statement 
‘Mel has red hair or Irving has brown hair’ being true.” Of course, we will need 
to clarify what it means for one statement to imply another, or be equivalent to an- 
other, but whatever the formal approach to these concepts is, intuitively the above 
two meta-statements seem correct. 

It might be objected to that the above examples of meta-statements are in fact 
statements in themselves, which is true enough informally, though in a formal setting, 
which we are not presenting here, there is indeed a difference between a well-formed 
statement in a given formal language and a meta-statement that we might make about 
such formal statements. In practice, the distinction between statements and meta- 
statements is straightforward enough for us to make use of it here. 

The two examples of relations between statements given above represent the two 
types of such relations we will study, namely, implication and equivalence, which 
are the meta-statement analogs of conditionals and biconditionals. We start with im- 
plication. 

The intuitive idea of logical implication is that statement P implies statement Q if 
necessarily Q is true whenever P is true. In other words, it can never be the case that 
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P is true and Q is false. Necessity is the key here, because one statement implying 
another should not simply be a matter of coincidentally appropriate truth values. 
Consider the statements P = “the sky is blue” and Q = “grass is green.” Given what 
we know about sky and grass, the statement “if the sky is blue then grass is green” is 
certainly true (that is, the statement P — Q is true), because both P and @Q are true. 
However, and this is the key point, we would not want to say that “the sky is blue” 
logically implies “grass is green,” because logical implication should not depend 
upon the particular truth values of the particular statements. What would happen if, 
due to some environmental disaster, all the grass in the world suddenly turned black, 
although the sky still stayed blue. Then the statement “if the sky is blue then grass is 
green” would be false. Because this possibility could in principle happen, we do not 
say that “the sky is blue” implies “grass is green.” In general, even though P — Q 
happens to be true now, given that it might be false under other circumstances, we 
cannot say that P implies Q. To have P imply Q, we need P — Q to be true under all 
possible circumstances. 

Now consider the two statements “it is not the case that, if Susan thinks Lisa is 
cute then she likes Lisa” and “Susan thinks Lisa is cute or she likes Lisa.” Whether or 
not each of these statements is actually true or false depends upon knowing whether 
or not Susan thinks Lisa is cute, and whether or not Susan likes Lisa. What will 
always be the case, as we will soon see, is that the statement “‘it is not the case that, 
if Susan thinks Lisa is cute then she likes Lisa” implies the statement “Susan thinks 
Lisa is cute or she likes Lisa,’ regardless of whether each component statement is 
true or false. 

Let P = “Susan thinks Lisa is cute” and Q =“‘Susan likes Lisa.” Then we want to 
show that =(P — Q) implies P V Q. We show this implication in two ways. First, we 
check the truth tables for each of =(P — Q) and PV Q, which are 


P|Q|- (P= 9) Plo|P vO 
LE FOL EF -£ T\T|T T T 
T\F|\T T F F T\F\T T F 
FUT | FOF FT TF F\T\F T T 
F\F\|F F T F F\F\F F F 

4) 1-3. 2. 1|3/2. 


The column numbered 4 in the first truth table has the truth values for —=(P — Q), 
and the column numbered 3 in the second truth table has the truth values for PV Q. 
We observe that in any row that has a T as the truth value for =(P — Q), there is 
also a T for the truth value of P V Q (there is only one such row in this case, but that 
is immaterial). It makes no difference what happens in the rows in which —(P — Q) 
has truth value F. Hence =(P — Q) logically implies P V Q. 

Alternatively, rather than having two truth tables to compare, we can use the con- 
ditional (defined in Section 1.2) to recognize that our observations about the above 
two truth tables is the same as saying that the single statement [=(P — Q)] — (PV Q) 
will always be true, regardless of the truth or falsity of P and Q. In other words, the 
statement [=(P — Q)] — (PV Q) will be a tautology (also in Section 1.2), as can be 
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seen in the truth table 


P|Q\[> (P = Q)| — (PV Q) 
TIP Te TP or 
v2 ae GR UR ae as Oa A 
FITIF ETT TFT T 
FIFIF FT F T FFF 

ed BS | See... 


We see in Column 8 that the statement [=(P — Q)] — (PV Q) is always true, and 
hence it is indeed a tautology. 

This last consideration leads to the precise notion of implication. Let P and Q 
be statements. We say that P implies Q if the statement P — Q is a tautology. We 
abbreviate the English expression “P implies Q” with the notation “P > Q.” 

It is important to note the difference between the notations “P = Q” and “P > 
Q.” The notation “P — Q” is a statement; it is a compound statement built up out of 
the statements P and Q. The notation “P = Q” is a meta-statement, which is simply 
a shorthand way of writing the English expression “P implies Q,” and it means that 
P — Qis not just true in some particular instances, but is a tautology. 

It might appear at first glance as if we are not introducing anything new here, 
given that we are defining implication in terms of conditional statements, but there 
is a significant new idea in the present discussion, which is that we single out those 
situations where P — Q is not just a statement (which is always the case), but where 
P — Q is a tautology. Moreover, we will see in Section 1.4 that implications of 
statements will be extremely useful in constructing valid arguments. In particular, 
the following implications will be used extensively. 


Fact 1.3.1. Let P, Q, Rand S be statements. 


(P—Q)AP=Q_ (Modus Ponens). 

(P+ Q)A\7=Q=>-P (Modus Tollens). 

PAQ=P__ (Simplification). 

PAQ=>Q __ (Simplification). 

P=3PVQ (Addition). 

Q=>PVQ_ (Addition). 

(PVQ)A=P=Q (Modus Tollendo Ponens). 

(PVQ)A7=Q =P (Modus Tollendo Ponens). 

. PS QO=>P—O _ (Biconditional-Conditional). 

10. P+ Q=>Q-—P _ (Biconditional-Conditional). 

Il. (P— Q)A(Q—P)=P<@Q_ (Conditional-Biconditional). 
12. (P= Q)A(Q—R)=>P—R_ (Hypothetical Syllogism). 
13. (P— Q)A(R—S)A(PVR)=QVS_— (Constructive Dilemma). 


SNAMAWNSE 


‘Ss 


Demonstration. We will show that Part (1) holds, leaving the rest to the reader in 
Exercise 1.3.6. 


(1). To demonstrate that (P — Q) \ P => Q, we need to show that the statement 
[((P — Q) AP] > Qisa tautology, which we do with the truth table 
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P\|ai(P- Q) AP| — @ 
Toe 8 rer se £ 
ToT PPE CE 
FIT|F TTFFTT 
FIF|F T FFFTF 

(ie me signe 


We see in Column 7 that the statement [(P — Q) A P] > Q is always true, and hence 
it is a tautology. /// 


The implications stated in Fact 1.3.1 were chosen because they are symbolic 
statements of various rules of valid argumentation. Consider, for example, Part (7). 
Suppose that P = “the cow has a big nose” and Q = “the cow has a small head.” 
Translating our statement yields “the cow has a big nose or a small head, and the 
cow does not have a big nose” implies “the cow has a small head.” This implication 
is indeed intuitively reasonable. The implications stated in Fact 1.3.1 will be used in 
Section 1.4, and so we will not discuss them in detail here. 

Logical implication is not always reversible. For example, we saw that “it is not 
the case that, if Susan thinks Lisa is cute then she likes Lisa” implies “Susan thinks 
Lisa is cute or she likes Lisa.” Written in symbols, we saw that =(P — Q) > PVQ. 
On the other hand, the same truth tables used to establish this implication also show 
that PV Q does not imply =(P — Q). For example, when P and @Q are both true, then 
P\ Qis true, but =(P — Q) is false. Alternatively, it can be seen by a truth table that 
(PV Q) > [-(P — Q)] is not a tautology. Hence “Susan thinks Lisa is cute or she 
likes Lisa” does not imply “it is not the case that, if Susan thinks Lisa is cute then 
she likes Lisa.” 

Some logical implications, however, are reversible. Such implications are very 
convenient, and they convey the idea of logical equivalence, to which we now turn. 
Certainly, two different English sentences can convey equivalent statements, for ex- 
ample “if it rains I will stay home” and “I will stay home if it rains.’ These two 
statements are both English variants of P — Q, where P = “‘it rains,” and Q = “IT 
will stay home.” The difference between these two statements is an issue only of the 
flexibility of the English language; symbolically, these two statements are identical, 
not just equivalent. 

What interests us are logically equivalent statements that are not simply English 
variants of the same symbolic statement, but rather are truly different statements. For 
example, the statement “it is not that case that I do not own a bicycle” will be seen to 
be equivalent to “I own a bicycle.” If we let P= “I own a bicycle,” then the statement 
“4t is not that case that I do not own a bicycle” is =(—P). This statement is not identi- 
cal to P. It will be very important to us to be able to recognize that some non-identical 
statements, for example —(—P) and P, are in fact logically equivalent. Such equiv- 
alences will allow us to find alternative forms of the statements of some theorems, 
and these alternative forms are sometimes easier to prove than the originals. 

The intuitive idea of equivalence of statements is that to claim that statements 
P and Q are equivalent means that necessarily P is true if and only if Q is true. 
Necessity is once again the key here, as can be seen once more using the statements 
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“the sky is blue” and “grass is green,’ which are not equivalent, even though both are 
true. By contrast, consider the two statements “if Fred has good taste in food, then 
he likes to eat liver” and “if Fred does not like to eat liver, then he does not have 
good taste in food.” We will show that these statements are equivalent, as follows. 
Let P = “Fred has good taste in food” and Q = “Fred likes to eat liver.’ Then we 
want to show the equivalence of P — Q and ~Q — —P. We need to see that each of 
these two statements is true when the other is true, and each is false when the other 
is false. Once again we can use truth tables. If we use separate truth tables, we see 
that 


P\o|P > Q P\Q|-0 > =P 
rrr? f rile ¢ ¥F 
T|F|T F OF T\|F|T F F 
FIT|F TT FIT|F T T 
FIF|F T F FIF|T T T 

1[3]2, i (3) 2. 


The columns numbered 3 in the truth tables have the truth values for P — Q and 
=Q — —P respectively. These columns are identical, which says that P — Q is true 
if and only if ~Q — —P is true. We can avoid having to compare two truth tables, 
this time by using the biconditional (defined in Section 1.2). The equality of the truth 
values of our two statements in the two truth tables above is the same as saying that 


the single statement (P — Q) — (=Q P) is a tautology, as can be seen in the 
truth table 

PIQ\(P > Q) & (-Q > =P) 

T|T|T T T T F T F 

T\|F|T F F T T F F 

FIT|F T T T F T T 

FIF|F T FT T T T 

3 27) 4 oe Se 


We see in Column 7 that the statement (P — Q) — (=Q P) is always true, and 
hence it is a tautology. 

In general, let P and Q be statements. We say that P and Q are equivalent if the 
statement P + Q is a tautology. We abbreviate the English expression “P and Q are 
equivalent” with the notation “P = Q.” 

It is important to note the difference between the notations “P = Q” and “P 
Q.” The latter is a statement, whereas the former is a meta-statement, which is simply 
a shorthand way of writing the English expression “P is equivalent to Q.” 

Listed below are some equivalences of statements that will be particularly useful. 
We will discuss some of these equivalences after stating them. 


Fact 1.3.2. Let P, Q and R be statements. 


I. =(=P) <P (Double Negation). 

2, PVQOSQVP_ (Commutative Law). 

3. PAQSQAP_ (Commutative Law). 

4. (PVQ)VRSPV(QVR) _ (Associative Law). 
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5. (PAQ)ARSPA(QAR) __ (Associative Law). 

6. PA(QVR)&(PAQ)V (PAR) | (Distributive Law). 
7. PV(QAR)&(PVQ)A(PVR) (Distributive Law). 
& P-Qs-PVQ. 

9 P—Q<-7Q—-P_ (Contrapositive). 

10. P-QOSQeP. 

I. POQS(P>Q)A(Q—>P). 

12. =(PAQ)<@-PV-7Q_ (De Morgan’s Law). 

13. =(PVQ)<<-=PA-7Q_ (De Morgan’s Law). 

14. -(P > 0) &PA-0. 

15. (PQ) & (PA=Q)V (-=PAQ). 


Demonstration. Part (9) was discussed previously. We will show here that Part (7) 
holds, leaving the rest to the reader in Exercise 1.3.7. The demonstration here is very 
similar to the demonstration of Fact 1.3.1 (1). 


(7). We need to demonstrate that PV (QAR) = (PV Q) \(PVR), which we 
do by showing that the statement [PV (QA R)] — [(PVQ) A(PV R)] is a tautology, 
which in turn we do with the truth table 


PIQ\RIIP V (QAR) @ [(PVQ) A (PV R) 
T\IT|T|IT TTT T T TTTTTTT 
T|T|F|T TTF F T TTTTTTFe 
T\|F\T|T TF FT T TTFTTTT 
T\|F\F|T TF FF T TTFTTT FP 
FP\ITTIFT TTT T FTTTFTT 
PUB EOF POR FF LD OF OP LD fe POF Ob 
FP\F\T)F FF FT YT FFF FFT T 
PUREE BE BOR EO oe Be ke BE 

45 13 2 |13) 6 8 7 129 11 10- 


We see in Column 13 that the statement[P V (QA R)] — [(PV Q) \(PV R)], and hence 
it is a tautology. /// 


Part (1) of Fact 1.3.2 might appear innocuous, but this equivalence plays a very 
important role in standard mathematical proofs. In informal terms, the equivalence 
of —(—P) and P means that “two negatives cancel each other out.’ From the point 
of view of constructing mathematical proofs, suppose that we want to show that a 
statement P is true. One method to prove this statement would be to hypothesize that 
—P is true, and derive a contradiction. It would then follow that —P is false, which 
implies that =(—P) is true. Because —(—P) and P are equivalent, it would follow 
that P is true. This methodology of proof might sound rather convoluted, but it is 
often quite useful, and is called proof by contradiction. A detailed discussion of this 
method of proof is in Section 2.3. 

Part (11) of Fact 1.3.2 gives a reformulation of the biconditional in terms of 
conditionals. For example, the statement “I will play the flute today if and only if I 
listen to the radio” is equivalent to the statement “if I play the flute today I will listen 
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to the radio, and if I listen to the radio I will play the flute today.” The equivalence 
of P+ Qand (P > Q) A\(Q — P) says that to prove a statement of the form P — Q, 
it is sufficient to prove (P — Q) \(Q — P); it therefore suffices to prove each of 
(P > Q) and (Q — P). As we will see in Chapter 2, the most basic type of statement 
that is proved in mathematics is a conditional statement. Hence, when we want to 
prove a theorem with a statement that is a biconditional, we will often prove the two 
corresponding conditional statements instead. See Section 2.4 for more discussion. 

Part (9) of Fact 1.3.2 allows us to reformulate one conditional statement in terms 
of another. For example, the statement “if it snows today, Yolanda will wash her 
clothes” is equivalent to “if Yolanda did not wash her clothes, it did not snow today.” 
Suppose that we know that the statement “if it snows today, Yolanda will wash her 
clothes” is true. Suppose further that in fact Yolanda did not wash her clothes. Then 
it could not have snowed, because if it had snowed, then surely Yolanda would have 
washed her clothes. On the other hand, if Yolanda did wash her clothes, we could not 
automatically conclude that it snowed, because Yolanda might choose to wash her 
clothes even when it does not snow. Therefore “if Yolanda did not wash her clothes, 
it did not snow today” must be true whenever “‘if it snows today, Yolanda will wash 
her clothes” is true. Similar reasoning shows that if the latter statement is true, then 
so is the former. 

Because the equivalence of the statements P + Q and ~Q — —P will be so impor- 
tant for constructing mathematical proofs, as seen in Section 2.3, relevant terminol- 
ogy is merited. Given a conditional statement of the form P — Q, we call ~Q — —P 
the contrapositive of the original statement. For example, the contrapositive of “if 
I eat too much I will feel sick” is “if I do not feel sick I did not eat too much.” 
Fact 1.3.2 (9) says that a statement and its contrapositive are always equivalent. 

We also give names to two other variants of statements of the form P — Q. We 
call Q — P the converse of the original statement, and we call ~P — —Q the inverse 
of the original statement. Continuing the example of the previous paragraph, the con- 
verse of “if I eat too much I will feel sick” is “if I feel sick then I ate too much”; the 
inverse of the original statement is “if I did not eat too much then I will not feel sick.” 
It is important to recognize that neither the converse nor the inverse is equivalent to 
the original statement, as the reader can verify by constructing the appropriate truth 
tables. If we look at the statements “if I feel sick then I ate too much” and “if I did not 
eat too much then I will not feel sick,” we observe that both of them mean that there 
is no other possible cause of feeling sick than eating too much, whereas the original 
statement “if I eat too much I will feel sick” says nothing of the sort. Although the 
converse and inverse of a statement are not equivalent to the original statement, we 
note that, however, that the converse and the inverse are equivalent to each another, 
as can be seen by applying Fact 1.3.2 (9) to the statement Q — P. 

One important use of equivalences of statements is to find convenient formu- 
las for the negations of statements. Such formulas are found in Parts (12)—(15) of 
Fact 1.3.2, which show how to negate conjunctions, disjunctions, conditionals and 
biconditionals. For example, what is the negation of the statement “it is raining and I 
am happy”? We could write “it is not the case that it is raining and I am happy,” but 
that is cumbersome, and slightly ambiguous (does the phrase “it is not the case that” 
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apply only to “it is raining,” or also to “I am happy”?) A common error would be to 
say “it is not raining and I am unhappy.” Observe that the original statement “it is 
raining and I am happy” is true if and only if both “it is raining” is true and if “I am 
happy” is true. If either of these two component statements is false, then the whole 
original statement is false. Hence, to negate “it is raining and I am happy,’ it is not 
necessary to negate both component statements, but only to know that at least one of 
them is false. Hence the correct negation of “it is raining and I am happy” is “it is not 
raining or I am unhappy.” A similar phenomenon occurs when negating a statement 
with “or” in it. The precise formulation of these ideas, known as De Morgan’s Laws, 
are Fact 1.3.2 (12) (13). 

What is the negation of the statement “if it snows, I will go outside’? As before, 
we could write “it is not the case that if it snows, I will go outside,” and again that 
would be cumbersome. A common error would be to say “if it snows, I will not go 
outside.” To see that this latter statement is not the negation of the original statement, 
suppose that “it snows” is false, and “I will go outside” is true. Then both “if it snows, 
I will go outside” and “if it snows, I will not go outside” are true, so the latter is not 
the negation of the former. The original statement “if it snows, I will go outside” is 
true if and only if “I will go outside” is true whenever “it snows” is true. The negation 
of the original statement therefore holds whenever “it snows” is true and “I will go 
outside” is false; that is, whenever the statement “it snows and I will not go outside” 
is true. The precise formulation of this observation is Fact 1.3.2 (14). 


Exercises 


Exercise 1.3.1. Let P, Q, R and S be statements. Show that the following are true. 


(1) -(P— Q) >P. 

(2) (PQ) A(P > 7Q) > -P. 

(3) P+ Q=> (PAR) > (QAR). 

(4) PA(Q GR) > (PAQ) OR. 

(5) P— (QAR) > (PAQ) — (PAR). 

(6) (P+ R)A(QS) => (PVQ) @ (RVS). 


Exercise 1.3.2. [Used in Exercise 1.3.12 and Section 2.4.] Let P, Q, A and B be state- 
ments. Show that the following are true. 


(1) PSPV(PAQ). 

(2) PS PA(PVQ). 

(3) PO OS (P—>Q)A(-P—>-Q). 
(4) P— (AAB) Ss (P—A)A(P—B). 
(5) P— (AVB) & (PAA) SB. 

(6) (AVB) -Q<(A—Q)A(B— Q). 
(7) (AAB) - Qe (A Q)V(B— Q). 
(8) (AAB) -QSA— (BQ). 
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Exercise 1.3.3. Let P be a statement, let T be a tautology and let C be a contradiction. 


(1) Show that PAT = P. 
(2) Show that PVC © P. 


Exercise 1.3.4. For each pair of statements, determine whether or not the first im- 
plies the second. 


(1) “If you will kiss me I will dance a jig, and I will dance a jig”; and “you will 
kiss me.” 

(2) “Yolanda has a cat and a dog, and Yolanda has a python”; and “Yolanda has 
a dog.” 

(3) “If cars pollute then we are in trouble, and cars pollute”; and “we are in 
trouble.” 

(4) “Our time is short or the end is near, and doom is impending”; and “the end 
is near.” 

(5) “Vermeer was a musician or a painter, and he was not a musician”; and “Ver- 
meer was a painter.” 

(6) “If I eat frogs’ legs I will get sick, or if I eat snails I will get sick”; and “if I 
eat frogs’ legs or snails I will get sick.” 


Exercise 1.3.5. For each pair of statements, determine whether or not the two state- 
ments are equivalent. 


(1) “If it rains, then I will see a movie”; and “it is not raining or I will see a 
movie.” 

(2) “This shirt has stripes, and it has short sleeves or a band collar”; and “this 
shirt has stripes and it has short sleeves, or it has a band collar.” 

(3) “It is not true that I like apples and oranges”; and “I do not like apples and I 
do not like oranges.” 

(4) “The cat is gray, or it has stripes and speckles”; and “the cat is gray or it has 
stripes, and the cat is gray or it has speckles.” 

(5) “It is not the case that: melons are ripe if and only if they are soft to the 
touch”; and “melons are ripe and soft to the touch, or they are not ripe or not 
soft to the touch.” 


Exercise 1.3.6. [Used in Fact 1.3.1.] Prove Fact 1.3.1 (2) (3) (4) (5) (6) (7) (8) Q) 
(10) (11) (12) (13). 


Exercise 1.3.7. [Used in Fact 1.3.2.] Prove Fact 1.3.2 (1) (2) (3) (4) (5) (6) (8) (10) 
(11) (12) (13) 14) C15). 


Exercise 1.3.8. State the inverse, converse and contrapositive of each of the follow- 
ing statements. 


(1) If it’s Tuesday, it must be Belgium. 

(2) I will go home if it is after midnight. 

(3) Good fences make good neighbors. 

(4) Lousy food is sufficient for a quick meal. 
(5) If you like him, you should give him a hug. 
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Exercise 1.3.9. For each of the following pair of statements, determine whether the 
second statement is the inverse, converse or contrapositive of the first statements, or 
none of these. 


(1) “If I buy a new book, I will be happy”; and “Tf I do not buy a new book, I will 
be unhappy.” 

(2) “I will be cold if I do not wear a jacket’; and “I will not be cold if I do not 
wear a jacket.” 

(3) “If you smile a lot, your mouth will hurt”; and “If your mouth hurts, you will 
smile a lot.” 

(4) “A warm house implies a warm bathroom”; and “A cold bathroom implies a 
cold house.” 

(5) “Eating corn implies that I will have to floss my teeth”; and “Not having to 
floss my teeth implies that I will eat corn.” 

(6) “Going to the beach is sufficient for me to have fun”; and “Not going to the 
beach is sufficient for me not to have fun.” 


Exercise 1.3.10. Negate each of the following statements. 


(1) &>0. (4) If y=3 then y? =7. 
(2) 3<S5or7>8. (5) w—3 > Oimplies w? +9 > 6w. 
(3) sin(4) <0 and tan(0) > 0. (6) a—b=cif and only ifa=b+c. 


Exercise 1.3.11. Negate each of the following statements. 


(1) It is Monday and it is snowing. 

(2) This book is red or it was written in 1997. 

(3) Susan likes to eat figs and drink prune juice. 

(4) If I tell you a joke, you will smile. 

(5) The play will end on time if and only if the actors are in good spirits. 
(6) The room will get painted if you buy the paint. 


Exercise 1.3.12. Simplify the following statements. You can make use of the equiv- 
alences in Exercise 1.3.2 in addition to the equivalences discussed in the text. 


(1) -(P > 7Q). (4) -(MVL)AL. 
(2) A— (AAB). (5) (P>Q)v@. 
(3) (XAY)—X. (6) =(X >Y)VY. 


Exercise 1.3.13. [Used in Example 6.3.5.] This exercise is related to switching cir- 
cuits, which are the basis for computer technology. See Example 6.3.5 for further 
discussion and references. 


(1) The operations A and V are examples of binary logical operations, in that 
they take two inputs and give one output; the operation — is an example of a 
unary logical operation, in that it takes one input and gives one output. How 
many possible unary and binary logical operations are there? List all of them 
using truth tables, and give the familiar names to those that we have already 
seen. 
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(2) Show that all the operations you found in Part (1) can be obtained by combi- 
nations of A and — operations. 

(3) Let A be the binary logical operation, often referred to as nand, defined by 
the truth table 


It is straightforward to verify that PA Q = —=(P A Q). Show that all the opera- 
tions you found in Part (1) can be obtained by combinations of A operations. 


1.4 Valid Arguments 


In the previous sections of this chapter we looked at statements from the point of 
view of truth and falsity. We verified the truth or falsity of statements via truth ta- 
bles, which allowed us to consider all possible ways in which various component 
statements might be true or false. This approach, while the most basic way to treat 
the truth or falsity of statements, does not appear to resemble the way mathemati- 
cians prove theorems, which is by starting with the hypotheses, and then writing one 
new statement at a time, each of which is implied by the previous statements, until 
the conclusion is reached. In this section we look at the analogous construction in 
logic, that is, the rules of logical argumentation, and we will see the relation of this 
approach to what was discussed in the previous sections of this chapter. 

When we turn to the formulation of mathematical proofs in Chapter 2, we will 
be focusing on the mathematical content of our proofs, and we will not explicitly 
refer to the rules of logical argumentation discussed in the present section—doing 
so would be a distraction from the mathematical issues involved. We will also not 
be using the logical notation of the present section in future chapters. Nonetheless, 
we will be using the rules of logical argumentation implicitly all the time. For a 
mathematician these rules of logic are somewhat similar to a body builder’s relation 
to the skeleton of the human body—you do not always think about it explicitly as 
you do your work, but it is the framework upon which all is built. 

Consider the following collection of statements, which has a number of premises 
together with a conclusion. 


If the poodle-o-matic is cheap or is energy efficient, then it will not make 
money for the manufacturer. If the poodle-o-matic is painted red, then it will 
make money for the manufacturer. The poodle-o-matic is cheap. Therefore 
the poodle-o-matic is not painted red. 


This collection of statements is an example of a logical argument, which in general 
is a collection of statements, the last of which is the conclusion of the argument, 
and the rest of which are the premises of the argument. Clearly, the use of the word 
“argument” in logic is different from the colloquial use of the word, where it could 
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mean the reasons given for thinking that something is true, or it could mean a heated 
(and not necessarily logical) discussion. 

An argument is a collection of statements that are broken up into premises and 
a conclusion. Of course, a random collection of statements, in which there is no in- 
herent connection between those designated as premises and the one designated as 
conclusion, will not be of much use. An argument is valid if the conclusion necessar- 
ily follows from the premises. Thinking about the notion of logical implication used 
in Section 1.3, we can say that an argument is valid if we cannot assign truth values 
to the component statements used in the argument in such a way that the premises 
are all true but the conclusion is false. To a mathematician, what logicians call an 
argument would simply correspond to the statement of a theorem; the justification 
that an argument is valid would correspond to what mathematicians call the proof of 
the theorem. 

How can we show that our sample argument given above is valid? We start by 
converting the argument to symbols. Let C = “the poodle-o-matic is cheap,” let E = 
“the poodle-o-matic is energy efficient,” let M = “the poodle-o-matic makes money 
for the manufacturer” and let R = “the poodle-o-matic is painted red.” The argument 
then becomes 


(CVE) —-=M 
R-—M 

C 

aR, 


where the horizontal line separates the premises from the conclusion. Alternatively, 
in keeping with our notation from Section 1.3, we could write this argument as [(C V 
E) > 7=M] A(R M)AC=>2-R. 

Considering the last way we wrote our argument, we could attempt to show that 
itis valid just as we showed that certain logical implications were true in Section 1.3, 
that is, by showing that the statement {[(C VE) — =M] A(R — M) AC} > 7Risa 
tautology, which we could accomplish by using a truth table. This method would 
indeed work, but it would be neither pleasant nor helpful. First, given that there are 
four statements involved, the needed truth table would have 16 rows, which would 
be somewhat tedious. For even more complicated arguments, the truth tables would 
have to be even larger. Second, using a truth table gives no intuitive insight into why 
the argument is valid. Finally, when proving mathematical statements, we often use 
quantifiers (as described in Section 1.5), which make truth tables virtually impossible 
to use. Mathematical proofs (except perhaps in the field of logic) are never done with 
truth tables. 

Instead of using truth tables, we will try to justify the validity of arguments by 
making use of what we learned in Section 1.3 about logical implication. If we want 
to show that a complicated logical implication holds, perhaps we could do so by 
breaking it down into a collection of simpler implications, taken one at a time. If 
the simpler implications are already known, then they could be building blocks for 
the more complicated implication. Some of the standard simple implications that we 
use, known as rules of inference, are listed below. Most of these simple implications 
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should be familiar—they were proved in Fact 1.3.1, although we are stating them in 
a different format here, to conform to the notation used for logical arguments. 


Modus Ponens P—@Q Modus Tollendo Ponens PVQ 
P =P 
Q Q 
Modus Tollens P—Q Modus Tollendo Ponens PVO 
=P P 
Double Negation = =7>P_ Biconditional-Conditional PQ 
P P—O 
Double Negation =P _ Biconditional-Conditional P= Q 
Bepennon Po Conditional-Biconditional P— Q 
Q-P 
Simplification PA P=@Q 
P 
Hypothetical Syllogism P—@Q 
Simplification PAQ O—R 
Q P—R 
Adjunction P Constructive Dilemma PQ 
Q R-S 
PAQ PVR 
Addition P Ovs 
PVQ 
Addition O 
PVQ 


The names for some of the above rules of inference, such as modus ponens, 
are quite standard; a few of the rules of inference have slightly different names in 
different texts. There are more rules of inference, but the ones listed above suffice 
for our purposes. See [KMM80] for a thorough discussion of rules of inference. 

A few of the rules of inference listed above were not treated in Fact 1.3.1, al- 
though they are easily seen to be true. Double Negation is proved in Fact 1.3.2, 
although here we state it as two implications, rather than one equivalence. Repetition 
is evidently true (because P — P is a tautology), but is still useful as a rule of infer- 
ence. Adjunction is just a glorified version of repetition, because if we stated it in the 
format of Fact 1.3.1, it would look like PAQ => PAQ. 

We now return to our argument concerning the poodle-o-matic. Using the rules 
of inference listed above, we can construct a justification for the argument. We use 
here the two-column format that may be familiar from high school geometry proofs, 
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in which each line is labeled by a number, and is given a justification for why it 
is true in terms of previous lines and rules of inference; no justification is needed 
for the premises. (We will not, it is worth noting, use this two-column format in 
mathematical proofs, starting in Chapter 2.) Our justification for the argument is 


(1) (CVE) >-M 


(2) R-M 

(3) C 

(CVE (3), Addition 

(5) =M (1), (4), Modus Ponens 
(6) =R (2), (5), Modus Tollens. 


This sort of justification, often referred to by logicians as a derivation, is a chain 
of statements connected by meta-statements (namely, the justifications for each line). 
If an argument has a derivation, we say that the argument is derivable. Observe 
that the derivability of an argument is one thing, and the truth of the component 
statements involved is another. We can have a derivable argument with component 
statements that happen to be true, or happen to be false, and we can have a non- 
derivable argument with component statements that happen to be true, or happen 
to be false. The derivability of an argument is only a question of the relation of 
the conclusion of the argument with the premises, not whether the conclusion or 
premises are actually true. 

For a given argument, there is often more than one possible derivation. The fol- 
lowing is another derivation for the poodle-o-matic argument, this time making use 
of the equivalences of statements given in Fact 1.3.2, in addition to our rules of infer- 
ence. In general, it is acceptable in a derivation to replace one statement with another 
that is equivalent to it. The alternative derivation is 


(1) (CVE) >-M 


(2) R-M 

(3) C 

(CVE (3), Addition 

(5) -M — AR (2), Contrapositive 

(6) (CVE) > -7R (1), (5), Hypothetical Sylogism 
(7) aR (4), (6), Modus Ponens. 


This alternative derivation happens to be longer than the previous one, but our pur- 
pose here is only to show that alternatives exist, not to find the most efficient deriva- 
tion. 

We now face an important question: given an argument, we have two notions 
of whether the argument works, which are that it is or is not valid, and that it is or 
is not derivable. The former notion involves checking truth values (which is done 
with truth tables), the latter constructing a chain of statements linked by rules of in- 
ference. What is the relation between these two approaches? Though it is not at all 
obvious, nor easy to prove, it turns out quite remarkably that these two approaches, 
while different in nature, always yield the same result. That is, an argument is valid 
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if and only if it is derivable. Hence, if we want to show that a given argument is 
valid, it will suffice to show that it is derivable, and vice versa. The equivalence of 
these two approaches is a major result in logic. That validity implies derivability is 
often referred to as the “Completeness Theorem,” and that derivability implies valid- 
ity is often referred to as the “Soundness Theorem” or “Correctness Theorem.” See 
[End72, Section 25] and [EFT94, Chapters 4 and 5] for details. (Different treatments 
of this subject might use different collections of rules of inference, but the basic ideas 
are the same.) 

From the above considerations we see that to show that a given argument is valid, 
we simply need to find a derivation, which is often a much more pleasant prospect 
than showing validity directly. To show that a given argument is invalid, however, 
derivations are not much help, because we would need to show that no derivation 
could possibly be found. It would not suffice to say that you tried your best to find a 
derivation but could not find one, because you cannot be sure that you have not sim- 
ply overlooked a derivation that works. Rather, to show that an argument is invalid, 
we use the definition of validity directly, and we find some truth values for the com- 
ponent statements of the argument that make the premises all true but the conclusion 
false. 

Consider the following argument. 


If aliens land on planet Earth, then all people will buy flowers. If Earth 
receives signals from outer space, then all people will grow long hair. Aliens 
land on Earth, and all people are growing long hair. Therefore all people buy 
flowers, and the Earth receives signals from outer space. 


This argument is invalid, which we can see as follows. Let A = “aliens land on planet 
Earth,” let R = “all people buy flowers,” let S = “Earth receives signals from outer 
space” and let H = “all people grow long hair.” The argument then becomes 


A—R 
S—H 
AAH 
RAS. 


Suppose that A is true, that R is true, that S is false and that H is true. Then A — R and 
S— H and A/H are all true, but R/S is false. Therefore the premises are all true 
but the conclusion is false, which means that the argument is invalid. For some other 
combinations of A, R, S and H being true or false, it works out that the premises are 
all true and the conclusion is true, and for some combinations of A, R, S and H being 
true or false, it works out that the premises are not all true (in which case it does not 
matter whether the conclusion is true or false for the conclusion to be implied by the 
premises). Nonetheless, the existence of at least one set of truth values for A, R, S 
and A for which the premises are all true but the conclusion is false is sufficient to 
cause the argument to be invalid. 

We now look at a particular type of argument for which special care is needed. 
Before reading further, try to figure out what is strange about this argument. 
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Jethro does not play the guitar, or Susan plays the flute. If Leslie does not 
play the xylophone, then Susan does not play the flute. Jethro plays the gui- 
tar, and Leslie does not play the xylophone. Therefore Ferdinand plays the 
accordion. 


The strange thing about this argument is that there is no apparent connection between 
the conclusion and the premises. However, try as you might, you will not be able to 
find truth values for the component statements used in the argument for which the 
premises are all true but the conclusion is false. The argument is in fact valid, as odd 
as that might appear. Let J = “Jethro plays the guitar,’ let S = “Susan plays the flute,” 
let L = “Leslie plays the xylophone” and let F = “Ferdinand plays the accordion.” 
A derivation for this argument is 


(1) -JVS 

(2) ~=L— 7S 

(3) JAAL 

(4) J (3), Simplification 

(5)JVF (4), Addition 

(6) =L (3), Simplification 

(7) 7S (2), (6), Modus Ponens 

(8) —J (1), (7), Modus Tollendo Ponens 
(9) F (5), (8), Modus Tollendo Ponens. 


This derivation has no flaws, though there is still something suspicious about it. To 
see what is going on, consider the following derivation, which is also completely 
correct. 


(1) -JVS 

(2) ~L— 7S 

(3) JAAL 

(4) J (3), Simplification 

(5) JV AF (4), Addition 

(6) =L (3), Simplification 

(7) 7S (2), (6), Modus Ponens 

(8) -J (1), (7), Modus Tollendo Ponens 
(9) =F (5), (8), Modus Tollendo Ponens. 


In other words, the same premises can be used to imply the negation of the conclusion 
in the original argument. 

How can it be that the same premises can imply a conclusion and its negation? 
The answer is that the premises themselves are no good, in that they form a contra- 
diction (as defined in Section 1.2). In symbols, the premises are (—J VS) \ (=L > 
=S) A (J AaL), and, as is left to the reader to check with a truth table, this statement 
is a contradiction. We leave it to the reader to supply the details. The key to this 
strange state of affairs is the definition of the conditional. Recall that a statement of 
the form P — Q is always true whenever P is false, regardless of whether Q is true 
or false. So, if we have premises that form a contradiction, that is, they are always 
false, then we can logically derive any desired conclusion from these premises. 
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The moral of this story is that we should avoid arguments that have premises that 
form contradictions. Such premises are often called inconsistent. Premises that are 
not inconsistent are called consistent. It is not that there is anything logically wrong 
with inconsistent premises, they are simply of no use to mathematicians, because we 
can derive anything from them. For example, when non-Euclidean geometry was first 
discovered in the early nineteenth century, it was important to determine whether the 
proposed axiom system for such geometry was consistent or not. In many mathemat- 
ical situations, for example geometry, it is not possible to demonstrate consistency 
directly via truth tables and the like, but it was eventually shown that non-Euclidean 
is no less consistent than Euclidean geometry. Because Euclidean geometry is so well 
studied and so widely used, and its consistency is not generally doubted, it followed 
that non-Euclidean geometry was no less worthwhile mathematically than Euclidean 
geometry. See [Tru87, Chapter 7] for details. 

Whereas arguments with inconsistent premises are not logically flawed, but 
rather do not allow for any useful conclusions, we often do encounter logical er- 
rors in both formal and informal argumentation. We conclude this section with a 
brief mention of a few common logical errors, often referred to as fallacies, that are 
regularly found in attempted mathematical proofs (and elsewhere). 

The first two errors we mention involve applications of commonly used non- 
existent “rules of inference.” For example, consider the following argument. 


If Fred eats a good dinner, then he will drink a beer. Fred drank a beer. 
Therefore Fred ate a good dinner. 


This argument is definitely invalid. The first premise states that Fred will drink a 
beer if something happens, namely, if he eats a good dinner. It does not say that he 
would not drink a beer otherwise. Hence, just because we assume that Fred drank a 
beer, we cannot conclude anything about Fred’s dinner. In symbols, the argument is 
(P + Q)\Q =P. There is no such implication, as can be seen by checking the truth 
table for [(P + Q) A Q] — P, which is not a tautology. This fallacy is known as the 
fallacy of the converse (and is also known as the fallacy of affirming the consequent). 
Our next type of fallacy is seen in the following argument. 


If Senator Bullnose votes himself a raise, then he is a sleazebucket. Senator 
Bullnose did not vote himself a raise. Therefore the senator is not a sleaze- 
bucket. 


Again this argument is invalid. The first premise says what we could conclude if the 
senator does a certain thing, namely, votes himself a raise. It does not say anything 
if that certain thing does not happen. Therefore, just because the senator did not vote 
himself a raise, we cannot conclude anything about his character—there could be 
many other things that might raise questions about him. In symbols, the argument 
here is (P > Q) \=P = —Q. Again, there is no such implication, as can be seen 
by checking the appropriate truth table. This fallacy is known as the fallacy of the 
inverse (and is also known as the fallacy of denying the antecedent). 

The third type of error we mention is of a slightly different nature. Consider the 
following argument. 
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If Deirdre has hay fever, then she sneezes a lot. Therefore Deirdre sneezes a 
lot. 


The problem with this argument, which again is invalid, is not the use of an incorrect 
“rule of inference,” but rather the making of an unjustified assumption. If we were 
also to assume that in fact Deirdre has hay fever, then we could use Modus Ponens to 
conclude that she sneezes a lot. Without that assumption, however, no such conclu- 
sion can be drawn. This fallacy is known as the fallacy of unwarranted assumptions. 

The examples we just gave of fallacious arguments might seem so trivial that 
they are hardly worth dwelling on, not to mention give names to. They are ubiq- 
uitous, however, both in everyday usage (in political discussions, for example) and 
in mathematics classes, and are especially hard to spot when embedded in lengthier 
and more convoluted argumentation. Hence we alert you to them here. For further 
discussion of fallacies in formal and informal argumentation, see [KMMB80, Sec- 
tion 1.5]. For errors in argumentation involving not only logical mistakes but also 
rhetorical devices such as appeals to authority, irrelevant circumstances and abusive 
statements, see [Cop68, Chapter 3]. 


Exercises 


Exercise 1.4.1. For each of the following arguments, if it is valid, give a derivation, 
and if it is not valid, show why. 


qd) PAQ (4) LOM 
(PVQ)—R (MV N) > (L— K) 
R aPAL 
K 
(2) 7“x-¥Y (5) PO 
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aR 
(3) EF Qvs 
Beieee (6) -A—>(B—>-C) 
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Exercise 1.4.2. For each of the following arguments, if it is valid, give a derivation, 
and if it is not valid, show why. 


(1) If Fishville is boring, then it is hard to find. If Fishville is not small, then it is 
not hard to find. Fishville is boring. Therefore Fishville is small. 

(2) If the new CD by The Geeks is loud or tedious, then it is not long and not 
cacophonous. The new CD by The Geeks is tedious. Therefore the CD is not 
long. 
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(3) If the food is green, then it is undercooked. If the food is smelly, then it is 
stale. The food is green or it is stale. Therefore the food is undercooked or it 
is smelly. 

(4) If Susan likes fish, then she likes onions. If Susan does not like garlic, then 
she does not like onions. If she likes garlic, then she likes guavas. She likes 
fish or she likes cilantro. She does not like guavas. Therefore, Susan likes 
cilantro. 

(5) It is not the case that Fred plays both guitar and flute. If Fred does not play 
guitar and he does not play flute, then he plays both organ and harp. If he 
plays harp, then he plays organ. Therefore Fred plays organ. 

(6) If you rob a bank, you go to jail. If you go to jail, you do not have fun. If 
you have a vacation, you have fun. You rob a bank or you have a vacation. 
Therefore you go to jail or you have fun. 


Exercise 1.4.3. Write a derivation for each of the following arguments, all of which 
are valid. State whether the premises are consistent or inconsistent 


(1) If amoebas can dance, then they are friendly. If amoebas make people sick, 
then they are not friendly. Amoebas can dance and they make people sick. 
Therefore people are friendly. 

(2) If warthogs are smart, then they are interesting. Warthogs are not interesting 
or they are sneaky. It is not the case that warthogs are pleasant or not smart. 
Therefore warthogs are sneaky. 

(3) It is not the case that clothes are annoying or not cheap. Clothes are not cheap 
or they are unfashionable. If clothes are unfashionable they are silly. There- 
fore clothes are silly. 

(4) If music soothes the soul then souls have ears. Music soothes the soul or 
musicians are calm. It is not the case that souls have ears or musicians are 
calm. Therefore musicians have souls. 

(5) Computers are useful and fun, and computers are time consuming. If comput- 
ers are hard to use, then they are not fun. If computers are not well designed, 
then they are hard to use. Therefore computers are well designed. 

(6) If Marcus likes pizza then he likes beer. If Marcus likes beer then he does not 
like herring. If Marcus likes pizza then he likes herring. Marcus likes pizza. 
Therefore he likes herring pizza. 


Exercise 1.4.4. Find the fallacy, or fallacies, in each of the following arguments. 


(1) Good fences make good neighbors. Therefore we have good neighbors. 

(2) If Fred eats a frog then Susan will eat a snake. Fred does not eat a frog. 
Therefore Susan does not eat a snake. 

(3) The cow moos whenever the pig oinks. The cow moos. Therefore the pig 
oinks. 

(4) A nice day is sufficient for frolicking children or napping adults. Adults are 
napping. Therefore it is a nice day. 

(5) If my rabbit eats a hamburger, then she gets sick. If my rabbit gets sick, then 
she is unhappy. Therefore my rabbit gets sick. 


34 1 Informal Logic 


(6) If Snoozetown elects a mayor, then it will raise taxes. If Snoozetown does not 
raise taxes, then it will not build a new stadium. Snoozetown does not elect a 
mayor. Therefore it will not build a new stadium. 


1.5 Quantifiers 


Our discussion of logic so far has been missing one crucial ingredient used in the 
formulation of theorems and proofs. We often encounter in mathematics expressions 
such as “x? > 8,” which we might wish to prove. This expression as written is not 
precise, however, because it does not state which possible values of x are under con- 
sideration. Indeed, the expression is not a statement. A more useful expression, which 
is a statement, would be “x? > 8, for all real numbers x > 2.” The phrase “for all real 
numbers x > 2” is an example of a quantifier. The other type of quantifier commonly 
used is the first part of the statement “there exists a real number x such that x? = 9.” 
What is common to both these phrases is that they tell us about the variables under 
consideration; they tell us what the possible values of the variable are, and whether 
the statement involving the variable necessarily holds for all possible values of the 
variable or only for some values (that is, one or more value). 

The use of quantifiers vastly expands the range of possible statements that can be 
formed in comparison with the statements that were made in previous sections of this 
chapter. Quantifiers are so important that the type of logic that involves quantifiers 
has its own name, which is “first-order” (and is also known as “predicate’’) logic; 
the type of logic we looked at previously is called “‘sentential” (and is also known as 
“propositional’’) logic. 

Many statements of theorem in mathematics have quantifiers in them, some- 
times multiple quantifiers. The importance of quantifiers in rigorous proofs cannot 
be overestimated. From the author’s experience teaching undergraduate mathemat- 
ics courses, confusion arising out of either the misunderstanding of quantifiers in 
complicated definitions and theorems, or the ignoring of quantifiers when writing 
proofs, is the single largest cause of problems for students who are learning to con- 
struct proofs. A solid understanding of how to use quantifiers is therefore well worth 
acquiring. 

Quantifiers can arise in a variety of statements. Consider the statement “some 
people in this room have red hair.” Though it might not appear so at first, this state- 
ment does inherently have a quantifier, because it could be rephrased as “there exists 
a person in this room who has red hair.” The statement “all cats like to eat all mice” 
has two quantifiers. We could rephrase this statement as “for each cat x, and each 
mouse y, cat x likes to eat mouse y.” The statement “every person has a mother” 
combines two different types of quantifiers, because it could be rephrased as “for 
each person A, there is a woman B such that B is the mother of A.” Of course, as 
with any other type of statement, a statement involving quantifiers is either true or 
false. The statement “every person has a mother” is true, whereas “every person has 
a sister” is false. 
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Quantifiers often occur in both colloquial and mathematical statements, even 
when they are not mentioned explicitly. Non-explicit quantifiers in colloquial English 
can occasionally lead to some odd confusions. What does the sentence “someone is 
hit by a car every hour’ mean? Does the same person keep getting hit every hour? In 
mathematics there is no room for ambiguous statements, and so when we attempt to 
prove a complicated mathematical statement, it is often useful to start by rephrasing 
it so as to make the quantifiers explicit. 

As a preliminary to our discussion of quantifiers, consider the expression P = 
“x-+y > 0.” Observe that x and y have the same roles in P. Using P we can form a 
new expression Q = “for all positive real numbers x, the inequality x + y > 0 holds.” 
In contrast to P, there is a substantial difference between the roles of x and yin Q. The 
symbol x is called a bound variable in Q, in that we have no ability to choose which 
values of x we want to consider. By contrast, the symbol y is called a free variable 
in Q, because its possible values are not limited. Because y is a free variable in Q, it 
is often useful to write Q(y) instead of Q to indicate that y is free. In P both x and y 
are free variables, and we would denote that by writing P(x,y). 

The difference between a bound variable and a free one can be seen by changing 
the variables in Q. If we change every occurrence of x to w in Q, we obtain O= 
“for all positive real numbers w, the inequality w+ y > 0 holds.” For each possible 
value of y, we observe that 0 and Q have precisely the same meaning. In other 
words, if Q were part of a larger expression, then the larger expression would be 
entirely unchanged by replacing Q with 0. By contrast, suppose that we change 
every occurrence of y to z in Q, obtaining Q = “for all positive real numbers x, the 
inequality x +z > 0 holds.” Then Q does not have the same meaning as Q, because 
y and z (over which we have no control in Q and @Q respectively) might be assigned 
different values, for example if Q were part of a larger expression that had both y 
and z appearing outside Q. In other words, changing the y to z made a difference 
precisely because y is a free variable in Q. 

Observe that an expression with a free variable is not a statement. Our expression 
Q in the previous paragraph is not a statement because we cannot determine its truth 
or falsity without knowing something about the possible values of y under considera- 
tion. By contrast, the expression “for all positive real numbers x, and all real numbers 
y, the inequality x+y > 0 holds,” has no free variables, and it is indeed a statement 
(which happens to be false). 

We are now ready for a closer look at the two types of quantifiers that we will 
use. Let P(x) be an expression with free variable x. Let U denote a collection of 
possible values of x. A universal quantifier applied to P(x) is the statement, denoted 
(Vx in U)P(x), which is true if P(x) is true for all possible values of x in U. If the 
collection U is understood from the context, then we will write (Vx) P(x). 

One way to think of the statement (Vx in U)P(x) is to view it as the conditional 
statement “if x is in U, then P(x) is true.” As we saw in our discussion of conditional 
statements in Section 1.2, the truth of the statement “if x is in U, then P(x) is true” 
does not say anything about what happens if x is not in U. That is, if the statement 
(Vx in U)P(x) is true, it tells us only about P(x) when x is in U; it might or might not 
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be the case that P(x) is true for some values of x that are not in U, but we cannot tell 
that from the statement as written. 
There are a variety of ways to write (Vx in U)P(x) in English, for example: 


For all values of x in U, the statement P(x) is true; 
For each x in U, the statement P(x) is true; 

The statement P(x) is true for all x in U; 

All values of x in U satisfy the P(x). 


For example, let P(a@) = “person @ has red hair,” and let W be the collection of 
all people in the world. The statement (Va in W)P(@) would mean that “all people 
in the world have red hair” (which is certainly not a true statement). Let S() = “n is 
a perfect square greater than 1,” and C(n) = “n is a composite number” (a composite 
number is an integer that is not a prime number), where the collection of possible 
values of n is the integers. The statement (Vn) [S(2) — C(n)] can be written in English 
as “for all integers n, if n is a perfect square greater than 1, then n is a composite 
number” (this statement happens to be true). We could rephrase this statement by 
saying “for all perfect squares n greater than 1, the number 7 is a composite number,” 
or even more concisely as “all perfect squares greater than | are composite,’ where 
itis taken as implicitly known that the terms “perfect square” and “composite” apply 
only to integers (and not other types of numbers). 

Changing the collection U in a statement of the form (Vx in U)P(x) can change 
the truth or falsity of the statement, so that the choice of U is crucial. For example, 
let R(x) = “the number x has a square root.” If we let U be the collection of positive 
real numbers, then the statement (Vx in U)R(x) is true. On the other hand, if we let 
W be the collection of all real numbers, then the statement (Vx in W)R(x) is certainly 
false. 

For the sake of completeness, we need to allow the case where the collection U 
has nothing in it. In that case, the statement (Vx in U)P(x) is always true, no matter 
what P(x) is, for the following reason. The statement “(Vx in U)P(x)” is equivalent 
to the statement “if x is in U, then P(x) is true.” When the collection U has nothing 
in it, then the statement “x is in U” is false, and hence the conditional statement “if x 
is in U, then P(x) is true” is true. 

For the other type of quantifier we are interested in, once again let P(x) be a 
statement with free variable x, and let U denote a collection of possible values of x. 
An existential quantifier applied to P(x) is the statement, denoted (Sx in U)P(x), 
which is true if P(x) is true for at least one value of x in U. If the collection U 
is understood from the context, then we will write (4x) P(x). Observe that if the 
collection U has nothing in it, then the statement (Ax) P(x) is false. 

It is important to note that the phrase “at least one value of x in U” means one or 
more, possibly many, or even all x in U. In particular, if (Vx in U)P(x) is true, then 
(Ax in U)P(x) is true, except in the special case that U has nothing in it. Of course, 
the statement (Sx in U)P(x) does not imply that (Vx in U) P(x) is true, except in the 
case that U has either one thing or nothing in it. 

There are a variety of ways to write (4x in U) P(x) in English, for example: 


There exists some x in U such that P(x) holds; 
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There is x in U such that P(x) holds; 

There exists at least one x in U such that P(x) holds; 
For some value of x in U, the condition P(x) holds; 
It is the case that P(x) is true for some x in U. 


Let Q(r) = “person r has brown hair,” and let W be the collection of all people 
in the world. Then the statement (Sr in W)Q(r) would mean that “there is some- 
one with brown hair,’ or equivalently “some people have brown hair” (which is a 
true statement). Let E(m) = “m is an even number” and let M(m) = “mn is a prime 
number,” where the collection of possible values of m is the integers. The statement 
“some integers are even and prime” can be expressed symbolically by first rephras- 
ing it as “there exists x such that x is even and x is prime,” which is (4x) [E (x) AM(x)| 
(this statement is true, because 2 is both even and prime). 

The reader might wonder why we use only the above two types of quantifiers, 
and whether other quantifiers are needed. For example, the statement “no dog likes 
cats” clearly has a quantifier, but which quantifier is it? If we let U be the collection 
of all dogs, and if we let P(x) = “dog x likes cats,” then our statement is “there is no 
xin U such that P(x).” However, the expression “there is no x in U,” though certainly 
a quantifier of some sort, is neither a universal quantifier nor an existential quantifier. 
Fortunately, rather than needing to define a third type of quantifier to be able to 
handle the present statement, we can rewrite our statement in English as “every dog 
does not like cats,” and in symbols that becomes (Vx in U)(—P(x)). In general, all 
the quantification that we need in mathematics can be expressed in terms of universal 
quantifiers and existential quantifiers. 

We can form statements with more than one quantifier, as long as different quan- 
tifiers involve different variables. Suppose that P(x,y) = “x+y? = 3.” where x and 
y are real numbers. The statement (Vy)(ax)P(x,y) can then be written in English as 
“for all y there exists some x such that x + y? = 3,” or equivalently “for each y there 
is some x such that x + y? = 3.” This statement is true, because for any real number y 
we can always solve for x in terms of y, yielding x = 3 — y’. If we reverse the order 
of the quantifiers, we obtain the statement (Sx)(Vy)P(x,y), which can be written in 
English as “there exists some x such that for all y, the equation x+y” = 3 holds.” This 
statement is clearly false, because for any given x, there can be at most two values of 
y such that x + y? = 3. The order of the quantifiers therefore matters. 

When attempting to prove a theorem, the statement of which involves multiple 
quantifiers, it is sometimes useful to translate the statement of the theorem into sym- 
bols, to help keep track of the meaning of the quantifiers. Suppose that we are given 
the statement “if x is a non-negative real number, then x is a perfect square.” This 
statement can be interpreted as a doubly quantified statement by rephrasing it as “for 
each non-negative real number x, there is some real number y such that x = y~.” 
Written symbolically, the statement is 


(Vx in the non-negative real numbers) (Sy in the real numbers) (x = y’). 


Once again, it can be seen that reversing the order of the quantifiers in this statement 
would change its meaning. A lack of attention to the order of quantifiers can easily 
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lead to mistakes in proving theorems that have statements with multiple quantifiers. 
A very important occurrence of the importance of the order of multiple quantifiers is 
in the “e-65” proofs treated in real analysis courses; see Section 7.8 for a similar type 
of proof from real analysis, and see any introductory real analysis text for a detailed 
discussion of €-6 proofs. 

A non-mathematical example of a statement that can be clarified by writing it 
symbolically in terms of quantifiers is the statement “someone is hit by a car every 
hour,” which we encountered previously. Suppose that the possible values of x are 
all people, that the possible values of ¢ are all hour-long time intervals that start 
precisely on the hour and that C(x,t) = “person x is hit by a car at time t.” The 
statement “someone is hit by a car every hour” can then be written symbolically 
as (Vt)(Sx)C(x,t). Once again, the order of the quantifiers matters. The statement 
(Ax)(Vt)C(x,t) would mean that there is a single person who gets hit by a car every 
hour, which is not what the original statement intended to say. 

There are eight possible generic ways of writing two quantifiers in a statement 
that has variables. Most of the eight possibilities have different meanings from one 
another. Suppose, for example, that the possible values of x are all people, the possi- 
ble values of y are all types of , and that L(x, y) = “person x likes to eat fruit y.” The 
eight ways of applying two quantifiers to L(x,y) are as follows. 


(1) (Vx)(Vy)L(x,y). This statement can be written in English as “for each person 
x, for each type of fruit y, person x likes to eat y,’ and more simply as “every 
person likes every type of fruit.” To verify whether this statement is true, we 
would have to ask each person in the world if she likes every type of fruit; if 
even one person does not like one type of fruit, then the statement would be 
false. 


(2) (Vy)(Vx)L(x,y). This statement can be written as “for each type of fruit y, for 
each person x, we know x likes to eat y,’ and more simply as “every type of 
fruit is liked by every person.” This statement is equivalent to Statement 1. 


(3) (Vx)(Sy)L(x,y). This statement can be written as “for each person x, there is 
a type of fruit y such that x likes to eat y,’ and more simply as “every person 
likes at least one type of fruit.” To verify whether this statement is true, we 
would have to ask each person in the world if she likes some type of fruit; if 
at least one person does not like any type of fruit, then the statement would 
be false. 


(4) (Ax)(Vy)L(x,y). This statement can be written as “there is a person x such that 
for all types of fruit y, person x likes to eat y,” and more simply as “there is a 
person who likes every type of fruit.” To verify whether this statement is true, 
we would start asking one person at a time if she likes every type of fruit; 
as soon as we found one person who answers yes, we would know that the 
statement is true, and we could stop asking more people. If no such person is 
found, then the statement would be false. 


(5) (Vy)(Av)L(x,y). This statement can be written as “for each type of fruit y, 
there is a person x such that x likes to eat y,’ and more simply as “every type 
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of fruit is liked by at least one person.” To verify whether this statement is 
true, we would have to list all the types of fruit, and then for each type of fruit, 
ask one person at a time whether she likes the fruit; once we found someone 
who liked that fruit, we could move onto the next fruit, and again ask one 
person at a time about it. For the statement to be true, we would have to find 
at least one person per fruit, though the same person could be selected for 
more than one fruit. 


(6) (Ay)(Vx)L(x,y). This statement can be written as “there is a type of fruit y 
such that for all persons x, we know that x likes to eat y,” and more simply as 
“there is a type of fruit that all people like.” To verify whether this statement 
is true, we would have to list all the types of fruit, and then for one type of 
fruit at a time, ask each person in the world if she likes that type of fruit; as 
soon as we found one type of fruit that everyone likes, we would know that 
the statement is true, and we could stop asking about more types of fruit. 


(7) (Ax)(Sy)L(x,y). This statement can be written as “there is a person x such 
that there is a type of fruit y such that x likes to eat y,” and more simply as 
“there is a person who likes at least one type of fruit.” To verify whether this 
statement is true, we would have to start asking one person at a time if she 
likes some type of fruit; as soon as we found one person who answers yes, we 
would know that the statement is true, and we could stop asking more people. 


(8) (Sy)(Sx)L(x,y). This statement can be written as “there is a type of fruit y 
such that there is a person x such that x likes to eat y,” and more simply as 
“there is a type of fruit that is liked by at least one person.” This statement is 
equivalent to Statement 7. 


In the above example we had eight cases, because there were two variables. When 
there are more variables, then the number of cases will be even larger. Also, we 
observe that whereas most of the cases in the above example are different from one 
another, there exist some examples of statements where some of the distinct cases 
above happen to coincide (for example, where the roles of x and y in P(x,y) are 
equal). 

Some statements with quantifiers imply others. For the sake of avoiding special 
cases, we will assume that the collection U, which is often not written explicitly but 
is implicitly assumed, always has something it it. With one variable, we saw that 
(Vx)P(x) implies (Ax)P(x). With two variables, the various implications are shown 
in Figure 1.5.1. 

We now look at the negation of statements with quantifiers. For example, let 
Q = “all people have red hair.” The negation of this statement can, most directly, 
be written as ~Q = “it is not the case that all people have red hair.” For this last 
statement to be true, it would have to be the case that at least one person does not 
have red hair. Hence, we could rewrite —=Q as “there are people who do not have 
red hair.” We can rewrite Q and —Q using symbols as follows. Let P(x) = “person 
x has red hair.’ Then Q = (Vx)P(x), and ~Q = (Ax)(—P(x)). It is very important 
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to recognize that =Q is not the same as the statement “all people do not have red 
hair,’ which in symbols would be written (Vx)(—P(x)). This last statement is much 
stronger than is needed to say that Q is false. The effect of the negation of Q is to 
change the quantifier, as well as to negate the statement being quantified. 


(Vx)(Vy)P(x,y) <== — (Wy) (Vx)P(x,y) 


o \ 


(Ax) (Vy) P(x,y) (Ay) (Vx) P(x,y) 


Fig. 1.5.1. 


Similar reasoning holds for the negation of a statement with an existential quan- 
tifier. Let R = “there is a pig with wings.” The negation of this statement can be 
written most directly as =R = “‘it is not the case that there is a pig with wings,” and 
more simply as =R = “all pigs have no wings.” (It would be more natural in English 
to say “no pigs have wings,” but that phrasing is not useful to us here, because we 
do not have a quantifier that corresponds directly to “no pigs.”) Let W(x) = “pig x 
has wings.” Then R = (Ax)W(x), and aR = (Vx)(=W (x)). Observe that 4R is not the 
same as the statement “there is a pig with no wings,” which in symbols would be 
written (Sx)(4W(x)). This last statement is much weaker than is needed to say that 
R is false. Again, the effect of the negation of R is to change the quantifier, as well as 
to negate the statement being quantified. 

The two cases examined above are completely typical, as we now see. 


Fact 1.5.1. Let P(x) be a statement with free variable x, which takes values in some 
collection U. 


I. a[(Vx in U)P(x)] & (Ax in U)(>P(x)). 
2. =[(Ax in U)P(x)] = (Vx in U)(>P(x)). 


Unlike the equivalences discussed in Section 1.3, we cannot use truth tables to 
verify the equivalences in Fact 1.5.1, though they are true nonetheless, based on the 
meanings of the quantifiers. 

We can use the above equivalences to negate statements with more than one quan- 
tifier. For example, suppose that f is a function that takes real numbers to real num- 
bers (for example f(x) = x? for all real numbers x). Let Q = “for each real number w, 
there is some real number y such that f(y) = w.” We would like to find =Q. We start 
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by writing Q symbolically. Let P(w,y) = “f(y) = w.” Then Q = (Vw)(Ay)P(w,y). 
Using our equivalences we have 


7 = >[(Vw) (Sy) P(w,y)] < (Sw) [(Ay) PO, y)] 
(sw) (Vy)(>P(w,y)). 


Rephrasing this last expression in English yields —Q = “there exists a real number 
w such that for all real numbers y, the relation f(y) 4 w holds.” It is often easier to 
negate statements with multiple quantifiers by first translating them into symbolic 
form, negating them symbolically and then translating back into English. With a bit 
of practice it is possible to negate such statements directly in English as well, as long 
as the statements are not too complicated. 

Finally, we turn to rules of inference with quantifiers. There are four such rules 
of inference, and while their use requires a bit more care than the rules of inference 
in Section 1.4, they are used for the same purpose, which is to show the validity of 
logical arguments. 


Universal Instantiation (Vx in U)P(x) 
P(a) 


where a is anything in U. 


Existential Instantiation (Ax in U)P(x) 
P(b) 


where b is something of U, and where the symbol “b” does not 
already have any other meaning in the given argument. 


Universal Generalization P(c) 
(Vx in U)P(x) 


where c is an arbitrary thing in U. 


Existential Generalization P(d) 
(Ax in U)P(x) 


where d is something in U. 


Observe the restrictions on the variables used in each rule. For example, in Ex- 
istential Instantiation, it is important that when we deduce from (Ax in U)P(x) that 
P(b) holds for some b in U, we cannot assume that the letter “b” refers to any other 
symbol already being used in the argument. Hence we need to choose a new letter, 
rather than one already used for something else. In Universal Generalization, when 
we deduce from P(c) that (Vx in U)P(x), it is crucial that c be an arbitrarily chosen 
member of U. Otherwise, we could not conclude that P(x) is true for all x in U. 
This last observation is crucial when we attempt to prove mathematical statements 
involving universal quantifiers, as we will see in Section 2.5, and throughout this 
book. Though we will not necessarily be referring to them by name, these four rules 
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of inference will be used regularly in our mathematical proofs. See [Cop68, Chap- 
ter 10] for further discussion of these rules of inference. 
An example of a simple logical argument involving quantifiers is the following. 


Every cat that is nice and smart likes chopped liver. Every Siamese cat is 
nice. There is a Siamese cat that does not like chopped liver. Therefore there 
is a stupid cat. 


(We are assuming here that “stupid” is the negation of “smart.”) To translate this 
argument into symbols, let U be the collection of all cats, let N(x) = “cat x is nice,” 
let S(x) = “cat x is smart,” let C(x) = “cat x likes chopped liver” and let T(x) = “cat 
x is Siamese.” The argument then becomes 


(Vx in U)[(N (x) AS(x)) > C@)] 
(Vx in U)[T(x) > N(x)] 

ax in U)IT (x) A=C(Q)) 

dx in U)[>S(a)]. 


A derivation for this argument, using rules of inference from Section 1.4 as well 
as from this section, is 
(1) (Vx in U)[(N(x) 7 = C(x)] 
(2) (Vx in U)[T(x) — N(x)] 
(T 


an 
LU 


— 
L 


(3) (Ax in V)[T (x) AWC()] 

(4) T(a) \nC(a) (3), Existential Instantiation 

(5) 7C(a) (4), Simplification 

(6) T(a) (4), Simplification 

(7) T(a) > N(a) (2), Universal Instantiation 

(8) N(a) (7), (6), Modus Ponens 

(9) =7N(a) (8), Double Negation 

(10) (N(a) A S(a)) > C(a) (1), Universal Instantiation 

(11) =(N(a) AS(a)) (10), (5), Modus Tollens 

(12) =N(a) V 7S(a) (11), De Morgan’s Law 

(13) =S(a) (12), (9), Modus Tollendo Ponens 
(14) (Ax in U)[=S(x)] (13), Existential Generalization. 


“aq cL 


Observe that in line (4) we chose some letter that was not in use prior to that 
line, because we are using Existential Instantiation. We needed to use that rule of 
inference at that point in the derivation in order to remove the quantifier in line (3) of 
the premises, which then allows us to use the rules of inference given in Section 1.4 
(which did not involve quantifiers). In lines (7) and (10) we were free to use the same 
letter “a’’ as in line (4), because Universal Instantiation allows us to choose anything 
in U that we want. 


Exercises 


Exercise 1.5.1. Suppose that the possible values of x are all people. Let Y(x) = 
“x has green hair,’ let Z(x) = “x likes pickles” and let W(x) = “x has a pet frog.” 
Translate the following statements into words. 
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(1) (Vx)¥ (x). (4) (Ax)[Y (x) 
(2) (Ax)Z(x). (5) (Wx)[W(x) 
(3) (vx) [W (x) AZ(x)]. 


W(x)]. 
AZ (x)]. 


= 
oO 


os 


Exercise 1.5.2. Suppose that the possible values of x and y are all cars. Let L(x, y) = 
“x is as fast as y,” let M(x,y) = “x is as expensive as y” and let N(x, y) = “x is as old 
as y.” Translate the following statements into words. 


(1) (Ax)(Vy)L(~,y). (3) (Ay)(Vx)[LO,y) VN (x, y)]. 
(2) (Vx)(Ay)M (x,y). (4) (Vy)(Ax) [5M (x,y) > LQ, y)}- 


Exercise 1.5.3. Suppose that the possible values of y are all cows. Let P(y) = “y is 
brown,” let Q(y) = “y is four years old” and let R(y) = “‘y has white spots.” Translate 
the following statements into symbols. 


(1) There is a brown cow. 

(2) All cows are four years old. 

(3) There is a brown cow with white spots. 

(4) All four-year-old cows have white spots. 

(5) There exists a cow such that if it is four years old, then it has no white spots. 
(6) All cows are brown if and only if they are not four years old. 

(7) There are no brown cows. 


Exercise 1.5.4. Suppose that the possible values of p and q are all fruit. Let A(p,q) = 
“p tastes better than q,” let B(p,q) = “p is riper than q” and let C(p,q) = “p is the 
same species as q.” Translate the following statements into symbols. 


(1) There is a fruit such that all fruit taste better than it. 

(2) For every fruit, there is a fruit that is riper than it. 

(3) There is a fruit such that all fruit taste better than it and is not riper than it. 

(4) For every fruit, there is a fruit of the same species that does not taste better 
than it. 


Exercise 1.5.5. Convert the following statements, which do not have their quantifiers 
explicitly given, into statements with explicit quantifiers, both in symbols and in 
English. 


(1) People are nice. 

(2) Someone gave me a present. 

(3) Cats like eating fish and taking naps. 

(4) I liked one of the books I read last summer. 
(5) No one likes ice cream and pickles together. 


Exercise 1.5.6. Write a negation of each statement. Do not write the word “not” 
applied to any of the objects being quantified (for example, do not write “Not all 
boys are good” for Part (1) of this exercise). 


(1) All boys are good. 
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(2) There are bats that weigh 50 Ibs or more. 

(3) The equation x” — 2x > 0 holds for all real numbers x. 

(4) Every parent has to change diapers. 

(5) Every flying saucer is aiming to conquer some galaxy. 

(6) There is an integer n such that n? is a perfect number. 

(7) There is a house in Kansas such that everyone who enters the house goes 
blind. 

(8) Every house has a door that is white. 

(9) At least one person in New York City owns every book published in 1990. 


Exercise 1.5.7. Negate the following statement: There exists an integer Q such that 
for all real numbers x > 0, there exists a positive integer k such that In(Q— x) > 
5 and that if x <k then Q is cacophonous. (The last term used in this exercise is 
meaningless.) 


Exercise 1.5.8. Negate the following statement: For every real number € > 0 there 

exists a positive integer k such that for all positive integers n, it is the case that 
2 

lan —k*| <€. 


Exercise 1.5.9. Let x be a real number. The number x is gelatinous if it is both 
phlegmatic, and if for every integer n there is some real number y such that y* upper- 
encapsulates x or y++n lower-encapsulates x. How would you characterize a non- 
gelatinous real number x? (The terms used in this exercise are meaningless.) 


Exercise 1.5.10. Someone claims that the argument 


(ax in U)[P(x) A Q(x)] 
(Ax in U)[M(x)] 
(ax in U)[M(x) AQ()] 


is valid, using the alleged derivation 


(1) (Ax in U)[P(x) A Q(x) 

(2) (Sx in U)[M(x)] 

(3) P(a) A Q(a) (1), Existential Instantiation 
(4) O(a) (3), Simplification 

(5) M(a) (2), Existential Instantiation 
(6) M(a) A Q(a) (5), (4), Adjunction 

(7) (Ax in U)[M(x) A Q()] (6), Existential Generalization. 


Find the flaw(s) in the derivation. 
Exercise 1.5.11. Write a derivation for each of the following arguments. 


QQ) = (VxinU)[R(x) > C(x)] 
(Vx in U)[T (x) — R(x)] 
(Vx in U)[AC(x) > aT (x)). 
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(3) 


(4) 


a(Vx in W)[M(x)] 
(Vx in W)[E(x)] 
(Ax in W)[N(x)]. 


Exercise 1.5.12. Write a derivation for each of the following arguments. 


(1) Every fish that is bony is not pleasant to eat. Every fish that is not bony is 
slimy. Therefore every fish that is pleasant to eat is slimy. 

(2) Each high school student in Slumpville who takes an honors class is cool. 
There is a high school student in Slumpville who is smart and not cool. There- 
fore there is a high school student in Slumpville who is smart and not taking 
an honors class. 

(3) Every baby who eats will make a mess and drool. Every baby who drools will 
smile. There is a baby who eats and screams. Therefore there is a baby who 
smiles. 

(4) Every cockroach that is clever eats garbage. There is a cockroach that likes 
dirt or does not like dust. For each cockroach, it is not the case that it likes 
dirt or eats garbage. Therefore there is a cockroach such that it is not the case 
that if it is not clever then it likes dust. 


2 


Strategies for Proofs 


Rigour is to the mathematician what morality is to men. 
— André Weil (1906-1998) 


2.1 Mathematical Proofs—What They Are and 
Why We Need Them 


Not all mathematics involves proofs. We learn a good bit of arithmetic in grade 
school long before we learn how to prove that the rules of arithmetic are correct. 
Mathematics originated in the ancient world, in various cultures, prior to the notion 
of proof. It was the contribution of the ancient Greeks (who, contrary to popular 
misconception, did not invent mathematics, nor even geometry) to bring the notion 
of proof into mathematics. The first use of proof is generally attributed to Thales of 
Miletus, who lived in the sixth century B.C.E. Euclid, who lived in Alexandria in the 
third century B.C.E., brought the notion of proofs based on axioms to its first peak 
of success. See [Hea21] for a discussion of ancient Greek mathematics. 

Euclid used an axiomatic system—which is needed for proofs—in the field of 
geometry. Today, virtually all branches of pure mathematics are based on axiomatic 
systems, and work in pure mathematics involves the construction of rigorous proofs 
for new theorems. Much of the great mathematics of the past has been recast with a 
precision missing from its original treatment. Abstract algebra, for example, which 
received its modern form only in the last one hundred years, reconstructs the elemen- 
tary algebra studied in high school in a rigorous, axiomatic fashion. A lot of applied 
mathematics today also has rigorous foundations (though the work of applied math- 
ematicians, while no less challenging than pure mathematics, is not always oriented 
toward proofs). 

Be the above as it may, the importance of proofs should be put in the proper 
perspective. Intuition, experimentation and even play are no less important in today’s 
mathematical climate than rigor, because it is only by our intuition that we decide 
what new results to try to prove. The relation between intuition and formal rigor is not 
a trivial matter. Formal proofs and intuitive ideas essentially occupy different realms, 
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and we cannot “prove” that an intuitive idea is true. Instead, there is essentially a 
dialectical relationship between intuition and rigor. We set up formal systems that 
mirror our intuition as closely as possible; we then use what we prove rigorously to 
further our intuitive understanding, which in turn points to new theorems requiring 
rigorous proofs, and so forth. 

Mathematics has moved over time in the direction of ever greater rigor, though 
why that has happened is a question we leave to historians of mathematics to explain. 
We can, nonetheless, articulate a number of reasons why mathematicians today use 
proofs. The main reason, of course, is to be sure that something is true. Contrary 
to popular misconception, mathematics is not a formal game in which we derive 
theorems from arbitrarily chosen axioms. Rather, we discuss various types of mathe- 
matical objects, some geometric (for example, circles), some algebraic (for example, 
polynomials), some analytic (for example, derivatives) and the like. To understand 
these objects fully, we need to use both intuition and rigor. Our intuition tells us 
what is important, what we think might be true, what to try next and so forth. Unfor- 
tunately, mathematical objects are often so complicated or abstract that our intuition 
at times fails, even for the most experienced mathematicians. We use rigorous proofs 
to verify that a given statement that appears intuitively true is indeed true. 

Another use of mathematical proofs is to explain why things are true, though 
not every proof does that. Some proofs tell us that certain statements are true, but 
shed no intuitive light on their subjects. Other proofs might help explain the ideas 
that underpin the result being proved; such proofs are preferable, though any proof, 
even if non-intuitive, is better than no proof at all. A third reason for having proofs 
in mathematics is pedagogical. A student (or experienced mathematician for that 
matter) might feel that she understands a new concept, but it is often only when 
attempting to construct a proof using the concept that a more thorough understanding 
emerges. Finally, a mathematical proof is a way of communicating to another person 
an idea that one person believes intuitively, but the other does not. 

What does a rigorous proof consist of? The word “proof” has a different meaning 
in different intellectual pursuits. A “proof” in biology might consist of experimental 
data confirming a certain hypothesis; a “proof” in sociology or psychology might 
consist of the results of a survey. What is common to all forms of proof is that they 
are arguments that convince experienced practitioners of the given field. So too for 
mathematical proofs. Such proofs are, ultimately, convincing arguments that show 
that the desired conclusions follow logically from the given hypotheses. 

There is no formal definition of proof that mathematicians use (except for math- 
ematical logicians, when they develop formal theories of proofs, but these theories 
are distinct from the way mathematicians go about their daily business). Although we 
briefly discussed rules of inference and logical derivations in Section 1.4, what we are 
really interested in for the rest of this book is the way contemporary mathematicians 
do proofs, in order to prepare you for the kinds of proofs and basic mathematical 
concepts you will encounter in advanced mathematics courses. 

Mathematicians who are not logicians virtually never write proofs as strings of 
logical symbols and rules of inference, for a number of reasons. First, and fore- 
most, mathematical proofs are often much too long and complicated to be conve- 
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niently broken down into the two-column (statement-justification) format. Second, 
the mathematical ideas of the proof, not its logical underpinnings, are the main is- 
sue on which we want to focus, and so we do not even mention the rules of logical 
inference used, but rather mention only the mathematical justification of each step. 
Second, mathematicians who are not logicians, which means most mathematicians, 
find long strings of logical symbols not only unpleasant to look at, but in most cases 
rather difficult to follow. See [EFT94, pp. 70-71] for a fully worked out example 
of putting a standard mathematical proof in group theory into a two-column format 
using formal logic. The mathematical result proved in that example is given in Ex- 
ercise 7.2.8; see Sections 7.2 and 7.3 for a brief introduction to groups. One look at 
the difference between the mathematicians’ version of the proof and the logicians’ 
version, in terms of both length and complexity, should suffice to convince the reader 
why mathematicians do things as they do. 

To some extent mathematicians relate to proofs the way the general public often 
reacts to art—they know it when they see it. But a proof is not like a work of modern 
art, where self-expression and creativity are key, and all rules are to be broken, but 
rather like classical art that followed formal rules. (This analogy is not meant as an 
endorsement of the public’s often negative reaction to serious modern art—classical 
art simply provides the analog we need here.) Also similarly to art, learning to recog- 
nize and construct rigorous mathematical proofs is accomplished not by discussing 
the philosophy of what constitutes a proof, but by learning the basic techniques, 
studying correct proofs, and, most importantly, doing lots of them. Just as art criti- 
cism is one thing and creating art is another, philosophizing about mathematics and 
doing mathematics are distinct activities (though of course it helps for the practi- 
tioner of each to know something about the other). For further discussion about the 
conceptual nature of proofs, see [Die92, Section 3.2] or [EC89, Chapter 5], and for 
more general discussion about mathematical activity see [Wil65] or [DHM95]. 

Ultimately, a mathematical proof is a convincing argument that starts from the 
premises, and logically deduces the desired conclusion. How someone may have 
thought of a proof is one thing, but the proof itself has to proceed logically from start 
to finish. The distinction between a valid mathematical proof itself and how it was 
thought of is something that is very important to keep in mind when you work on 
your own proofs. When solving a problem, you first try all sorts of approaches to find 
something that works, perhaps starting with the hypotheses and working forwards, 
or starting with the conclusion and working backwards, or some combination of 
the two. Whatever your explorations might be, a record of such exploration should 
never be mistaken for a final proof. Confusing the exploration with the proof is a 
very common mistake for students first learning advanced mathematics. We will see 
some examples of this distinction later on. 

What is it that we prove in mathematics? We prove statements, which are usu- 
ally called theorems, propositions, lemmas, corollaries and exercises. There is not 
much difference between these types of statements; all need proofs. Theorems tend 
to be important results; propositions are usually slightly less important than theo- 
rems; lemmas are statements that are used in the proofs of other results; corollaries 
are statements that follow easily from other results; exercises are statements that are 


50 2 Strategies for Proofs 


left to the reader to prove. When discussing proofs, we will generically refer to “‘the- 
orems” when we mean any of theorems, propositions and the like. 
Let us examine the statement of a very famous theorem. 


Theorem 2.1.1 (Pythagorean Theorem). Let AABC be a right triangle, with sides 


of length a, b and c, where c is the length of the hypotenuse. Then a* +b* = c?. 


When asked what the Pythagorean Theorem says, students often state “a” + b? = 
c2.”” This expression alone is not the statement of the theorem—indeed, it is not a 
statement at all. Unless we know that a, b and c are the lengths of the sides of a right 
triangle, with c the length of the hypotenuse, we cannot conclude that a” + b? = c’. 
(The formula a? + b? = c? is never true for the sides of a non-right triangle.) It is 
crucial to state theorems with all their hypotheses if we want to be able to prove 
them. 

We will not give a proof of the Pythagorean Theorem; see [Loo040] for a variety 
of proofs. Rather, we want to consider its logical form. Although the words “if ... 
then” do not appear in the statement of the theorem, the statement is nonetheless a 
conditional statement (as discussed in Section 1.2). If we let P = “a, b and c are 
the lengths of the sides of a right triangle, with c the length of the hypotenuse,” 
and let Q = “a* +b? = c*,” then the theorem has the form P — Q. Many (if not 
all) statements of theorems are essentially conditional statements, or combinations 
of them, even though the words “if ... then” do not appear explicitly. A proof of 
a theorem is therefore an argument that shows that one thing implies another, or a 
combination of such arguments. It is usually much easier to formulate proofs for 
theorems when we recognize that they have the form P — Q, even if they are not 
given to us in that form. 

Theorems are not proved in a vacuum. To prove one theorem, we usually need to 
use various relevant definitions, and theorems that have already been proved. If we 
do not want to keep going backwards infinitely, we need to start with some objects 
that we use without definition, as well as some facts about these objects that are as- 
sumed without proof. Such facts are called axioms, and a body of knowledge that can 
be derived from a set of axioms is called an axiomatic system. In modern abstract 
mathematics, we take set theory as our basis for all arguments. In each branch of 
mathematics, we then give specific axioms for the objects being studied. For exam- 
ple, in abstract algebra, we study constructs such as groups, rings and fields, each of 
which is defined by a list of axioms; the axioms for groups are given in Section 7.2. 

In Chapters 3-6 we will discuss sets, and various basic constructs using sets 
such as functions and relations, which together form the basis for much of modern 
mathematics. Our concern in the present chapter, by contrast, is not with the basis 
upon which we rely when we construct proofs, but rather the construction of proofs 
themselves. It may appear as if we are doing things backwards, in that we are not 
starting with what we say is the basis for modern mathematics, but we want to be 
able to give proofs about sets in Chapter 3, so we need to know how to write proofs 
before discussing set theory. As a basis for our work in the present chapter, we will 
make use of standard definitions and properties of the familiar number systems such 
as the integers, rational numbers and real numbers. We will assume that the reader is 
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informally familiar with these numbers. See the Appendix for a brief list of some of 
the standard properties of the real numbers. 

We conclude this section with our first example of a proof. You are probably 
familiar with the statement “the sum of even numbers is even.” This statement can 
be viewed in the form P — Q if we look at it properly, because it actually says “if 
n and m are even numbers, then n +m is an even number.’ To construct a rigorous 
proof of our statement (as well as the corresponding result for odd numbers), we first 
need precise definitions of the terms involved. 

Our theorem is concerned with the integers, that is, the numbers 


Fes es ee 


and so we need to assume that we know what the integers are, that we have the op- 
erations addition, subtraction, multiplication and division, and that these operations 
satisfy standard properties, for example the Distributive Law. Using only those stan- 
dard facts about the integers, we can make the following definition, which is the basis 
for our theorem and its proof. 


Definition 2.1.2. Let n be an integer. The number 7 is even if there is some integer 
k such that n = 2k. The number n is odd if there is some integer 7 such that n = 
2j+1. A 


As the reader knows intuitively, and as we will prove in Corollary 5.2.6, every 
integer is either even or odd, but not both. 

We are now ready to state and prove our theorem. This result may seem rather 
trivial, but our point here is to see a properly done proof, not to learn an exciting new 
result about numbers. 


Theorem 2.1.3. Letn and m be integers. 


1. Ifn and mare both even, then n+ is even. 
2. Ifn and mare both odd, then n+ m is even. 
3. Ifn is even and m is odd, then n+ m is odd. 


Proof. 


(1). Suppose that n and m are both even. Then there exist integers k and j such 
that n = 2k and m = 27. Then 


n+m=2k+2j =2(k+j). 


Because k and j are integers, so is k+ j. Hence m+n is even. 


(2) & (3). These two parts are proved similarly to Part (1), and the details are 
left to the reader. 
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There is a fourth possible case we did not state in Theorem 2.1.3, namely, the case 
when n is odd and m is even, because that case is really no different from Part (3) 
of the theorem, and hence it would not tell us anything new; it makes no difference 
whether we call the even number n and the odd number m, or vice versa. 

The proof of Part (1) of Theorem 2.1.3 is quite simple, but there are a few fea- 
tures worth mentioning, because they are typical of what is found in virtually all our 
subsequent proofs (and in the proofs you will need to write). First, the proof relies 
completely on the definition of what it means to be an even or an odd integer. In a 
large number of proofs, going back to the formal definitions involved is the key step; 
forgetting to do so is a major source of error by students who are first learning about 
proofs. 

Second, observe that the proof is written in grammatically correct English. Com- 
plete sentences are used, with proper punctuation. Each sentence begins with a cap- 
ital letter, and ends with a period, even if the end of the sentence is in a displayed 
equation. Mathematical formulas and symbols are parts of sentences, and are treated 
no differently from other words. We will be writing all our proofs in this style; scratch 
work, by contrast, can be as careless as desired. The two-column method of writing 
proofs, which we used in our discussion of valid logical arguments in Section 1.4, 
and is often used in high school geometry, should be left behind at this point. Math- 
ematics texts and research papers are all written in the style of Theorem 2.1.3. See 
Section 2.6 for more about writing mathematics. 

An important consideration when writing a proof is recognizing what needs to be 
proved and what doesn’t. There is no precise formula for such a determination, but 
the main factor is the context of the proof. In an advanced book on number theory, 
it would be unnecessary to prove the fact that the sum of two even integers is even; 
it would be safe to assume that the reader of such a book would either have seen 
the proof of this fact, or could prove it herself. For us, however, because we are just 
learning how to do such proofs, it is necessary to write out the proof of this fact in 
detail, even though we know from experience that the result is true. The reasons to 
prove facts that we already know are twofold: first, in order to gain practice writing 
proofs, we start with simple results, so that we can focus on the writing, and not on 
mathematical difficulties; second, there are cases where “facts” that seem obviously 
true turn out to be false, and the only way to be sure is to construct valid proofs. 

Though mathematical proofs are logical arguments, observe that in the proof of 
Theorem 2.1.3 we did not use the logical symbols we discussed in Chapter 1. In 
general, it is not proper to use logical symbols in the writing of mathematical proofs. 
Logical symbols were used in Chapter | to help us become familiar with informal 
logic. When writing mathematical proofs, we make use of that informal logic, but 
we write using standard English (or whatever language is being used). 

For the record, in the proof of Theorem 2.1.3 we did make use of some of the 
rules of inference discussed in Section 1.4, though as will always be the case, these 
rules are not mentioned explicitly in proofs to avoid unnecessary length and clutter. 
For instance, the hypothesis in Part (1) has the form PA Q, where P = “n is even” 
and Q = “m is even.” The proofs starts by assuming that P A Q is true. We then 
used Simplification to deduce that each of P and Q is true, so that we could apply 
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the definition of even numbers to each, to deduce that each of the statements “there 
exists an integer k such that n = 2k” and “there exists an integer j such that m = 2j” 
holds. We then applied Adjunction to deduce that the statement “n = 2k andm = 2)” 
holds, so that we could do the calculation involving n +m. Finally, we made repeated 
use of Hypothetical Syllogism to put all the pieces of the proof together. Of course, 
even though mathematicians do not generally mention the rules of logical inference 
used in their proofs, care must be taken to ensure that the rules of inference are used 
correctly, even when not stated explicitly. 

One final comment on writing proofs: neither thinking up proofs nor writing 
them properly is easy, especially as the material under consideration becomes more 
and more abstract. Mathematics is not a speed activity, and you should not expect to 
construct proofs rapidly. You will often need to do scratch work first, before writing 
up the actual proof. As part of the scratch work, it is very important to figure out 
the overall strategy for the problem being solved, prior to looking at the details. 
What type of proof is to be used? What definitions are involved? Not every choice 
of strategy ultimately works, of course, and so any approach needs to be understood 
as only one possible way to attempt to prove the theorem. If one approach fails, try 
another. Every mathematician has, in some situations, had to try many approaches 
to proving a theorem before finding one that works; the same is true for students of 
mathematics. 


Exercises 


Exercise 2.1.1. Reformulate each of the following theorems in the form P — Q. 
(The statements of the theorems as given below are commonly used in mathematics 
courses; they are not necessarily the best possible ways to state these theorems.) 


(1) The area of the region inside a circle of radius r is nr. 

(2) Given a line / and a point P not on /, there is exactly one line m containing P 
that is parallel to /. 

(3) Let AABC be a triangle, with sides of length a, b and c. Then 


a b _ aie 
sind sinB sinc’ 


(4) eb) =e” 
(5) (Fundamental Theorem of Calculus) Let f be a continuous function on [a,b], 
and let F be any function for which F’(x) = f(x). Then 


[sear= Foo) _ F(a). 


2.2 Direct Proofs 


As mentioned in the previous section, the statement of virtually every theorem, when 
viewed appropriately, is of the form P — Q, or some combination of such statements. 
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For example, each of the three parts of Theorem 2.1.3 is of the form P — Q. To prove 
theorems, we therefore need to know how to prove statements of the form P — Q. 

The simplest form of proof, which we treat in this section, is the most obvious 
one: assume that P is true, and produce a series of steps, each one following from 
the previous ones, which eventually lead to Q. This type of proof is called a direct 
proof. That this sort of proof deserves a name is because there are other approaches 
that can be taken, as we will see in Section 2.3. An example of a direct proof is the 
proof of Theorem 2.1.3. 

How do we construct direct proofs? There is no single answer to this question, 
but some useful strategies exist. To start, it is important to recognize that what is 
“direct” about a direct proof is the way the proof reads when you are done writing 
it. The completed proof starts at the beginning (the statement P) and ends at the end 
(the statement Q), and shows how to get logically from the former to the latter. How 
you think of the proof is another matter entirely. The way a proof looks when you 
are done constructing it often has little relation to how you went about thinking of it, 
especially for more difficult proofs. Similarly to writing a literature paper, for which 
you might take notes, make an outline, prepare a rough draft and revise it a number of 
times, so too with constructing a rigorous mathematical proof—the final version may 
be the result of a process involving a number of distinct steps, and much revision. 

When constructing a proof, the first thing to do is specify what you are assuming, 
and what it is you are trying to prove. This comment may sound trivial, but the author 
has seen many students skip this important step in their rush to get to the details 
(which are usually more interesting). Then you pick a strategy for the proof; one 
such strategy is direct proof. The next stage is actually figuring out a proof, making 
use of your chosen strategy. If you cannot devise a proof using your chosen strategy, 
perhaps another strategy should be attempted. There is no fixed way of finding a 
proof; it requires experimentation, playing around and trying different things. Of 
course, with experience some standard ways of constructing proofs in certain familiar 
situations tend to suggest themselves. 

Even when the chosen strategy is direct proof, there are a number of ways of 
trying to figure out the details of the proof. To find a direct proof of P — Q, you 
might try assuming P, playing around with it, seeing where it leads. Or you might 
try looking at Q, determining what is needed to prove Q, and then what is needed to 
prove that, etc. Or you might do both of these, hoping to meet in the middle. However 
you go about working out the proof, once you understand it informally, you have only 
completed the “scratch work” stage of constructing the proof. Then comes the next 
stage, which is writing the proof in final form. No matter how convoluted a route 
you took in thinking up the proof, the final write-up should be direct and logical. Ina 
direct proof, the write-up should start with P and go step by step until Q is reached. 
Therefore, this type of proof typically has the following form. 


Proof. Suppose that P is true. 


(argumentation) 
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Then Q is true. 


We are now ready to give two simple examples of direct proof. We will put in 
more details here than one might normally include, in order to make each step as 
explicit as possible. We start with a definition concerning the integers. 


Definition 2.2.1. Let a and b be integers. The number a divides the number b if 
there is some integer g such that aq = b. If a divides b, we write a|b, and we say that 
ais a factor of b, and that b is divisible by a. A 


Before discussing the content of Definition 2.2.1, we need to make an important 
remark about its logical structure. The definition says that “the number a divides the 
number b if ...,’ where the ... describe a certain condition involving the numbers a 
and b. Strictly speaking, it would have been proper to write “if and only if” instead 
of just “if, because it is certainly meant to be the case that if the condition does not 
hold, then we do not say that a divides b. However, it is customary in definitions 
to write “if” rather than “if and only if,’ because it is taken as assumed that if the 
condition does not hold, then the term being defined cannot be applied. We will 
stick with the customary formulation of definitions, but it is important to think of 
definitions as meaning “if and only if.” 

To show the truth of a statement of the form “alb,” it is necessary to find an 
integer qg such that ag = b. Therefore, a statement of the form “a|b” is an existence 
statement. 

The expression “a|b” should not be confused with the fraction “a/b.” The latter 
is a number, whereas the former is a shorthand way of writing the statement “the 
integer a divides the integer b.’ For example, even though it is not sensible to write 
the fraction 7/0, it is perfectly reasonable to write the expression 7|0, because 7 does 
in fact divide 0, because 7-0 = 0. Because of this potential confusion, and also to 
avoid ambiguous expressions such as 1/2+3 (is that 5+ 3 or x3?) we suggest 
writing all fractions as ¢ rather than a/b. 

We now have two simple results about divisibility. The proof of each theorem 
is preceded by scratch work, to show how one might go about formulating such a 
proof. 


Theorem 2.2.2. Leta, b and c be integers. If a\b and b 


c, then alc. 


Scratch Work. Our goal is to show that alc, so that we need to find some integer 
k such that ak = c. We are free to choose any k that we can think of. Because a|b 
and b|c, there are integers g and r such that ag = b and br = c. Substituting the 
first equation into the second equation looks like a good idea to try, and we obtain 
(aq)r = c. By rearranging the left-hand side of this equation, we see that k = qr isa 


good guess. /// 


Proof. Suppose that a|b and b|c. Hence there are integers g and r such that aq = b 
and br = c. Define the integer k by k = qr. Then ak = a(qr) = (aq)r = br=c. 
Because ak = c, it follows that alc. 
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Compare the proof with the scratch work. The proof might not appear substan- 
tially better than the scratch work at first glance, and it might even seem a bit mys- 
terious to someone who had not done the scratch work. Nonetheless, the proof is 
better than the scratch work, though in such a simple case the advantage might not 
be readily apparent. Unlike the scratch work, the proof starts with the hypotheses and 
proceeds logically to the conclusion, using the definition of divisibility precisely as 
stated. Later on we will see examples where the scratch work and the proof are more 
strikingly different. 


Theorem 2.2.3. Any integer divides zero. 


Scratch Work. In the statement of this theorem we are not given any particular 
choices of “variables,” in contrast to the previous theorem (which was stated in terms 
of a, b and c). To prove something about any possible integer, we pick an arbitrary 
one, say n. Then we need to show that n|0. It would certainly not suffice to choose 
one particular number, say 5, and then show that 5 divides 0. Once we have chosen 
an arbitrary n, the rest of the details in this proof are extremely simple. /// 


Proof. Let n be an integer. Observe that n- 0 = 0. Hence n|0. 


The first step in proving a theorem often involves reformulating it in a more 
useful way, such as choosing n in the above proof. 

The reader might be concerned that, in comparison to the scratch work for the 
above two theorems, the way we wrote the proofs involves “covering up our tracks.” 
Although it might appear that way, the purpose of the proper writing of proofs is 
not at all to hide anything, but rather to make sure that what seemed like a good 
idea intuitively is indeed logical. The only way to check whether a proof is really 
valid is to write it up properly, and such a write-up does not include a description 
of everything that went through your mind when you were figuring out the details 
of the proof. The final proof must stand on its own, with no reference to what was 
written in the scratch work. For example, not all arguments are reversible, and an 
argument that worked backwards during scratch work might not work when written 
forwards, and it is only by writing the proof properly that we find out if the idea 
really works. Intuitive thinking that may have been useful in formulating the proof 
should be replaced with logical deduction in the final written proof. 

In sum, there are two main steps to the process of producing a rigorous proof: 
formulating it and writing it. These two activities are quite distinct, though in some 
very simple and straightforward proofs you might formulate as you write. In most 
cases, you first formulate the proof (at least in outline form) prior to writing. For a 
difficult proof the relation between formulating and writing is essentially dialectical. 
You might formulate a tentative proof, try writing it up, discover some flaws, go back 
to the formulating stage and so on. 


Exercises 


Exercise 2.2.1. Outline the strategy for a direct proof of each of the following state- 
ments (do not prove them, because the terms are meaningless). 
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(1) Let 7 be an integer. If 7|n, then n is bulbous. 

(2) Every globular integer is even. 

(3) If an integer is divisible by 13 and is greater than 100, then it is pesky. 
(4) An integer is both tactile and filigreed whenever it is odd. 


Exercise 2.2.2. Let n and m be integers. 


(1) Prove that 1|n. 
(2) Prove that n|n. 
(3) Prove that if m|n, then m|(—n). 


Exercise 2.2.3. Let be an integer. 


(1) Prove that if n is even, then 37 is even. 
(2) Prove that if n is odd, then 37 is odd. 


Exercise 2.2.4. [Used in Theorem 2.3.5 and Theorem 2.4.1.] Let 1 be an integer. Prove 
that if n is even then n? is even, and if n is odd then n? is odd. 


Exercise 2.2.5. Let n and m be integers. Suppose that n and m are divisible by 3. 


(1) Prove that n+ m is divisible by 3. 
(2) Prove that nm is divisible by 3. 


Exercise 2.2.6. Let a, b, c, m and n be integers. Prove that if alb and alc, then 
a\(bm + cn). 


Exercise 2.2.7. Let a, b, c and d be integers. Prove that if alb and c 


d, then ac|bd. 


Exercise 2.2.8. Let a and b be integers. Prove that if a|b, then a”|b” for all positive 
integers n. (There is no need for mathematical induction here.) 
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In this section we discuss two strategies for proving statements of the form P — 
Q. Both these strategies are a bit more convoluted than direct proof, but in some 
situations they are nonetheless easier to work with. A less than perfect analogy might 
be when the straightest road between two cities leads up and down a mountain and 
through difficult terrain, whereas a curved road might at first seem to be going in 
the wrong direction, but in fact it bypasses the mountain and is ultimately easier and 
quicker than the straight road. 

There is no foolproof method for knowing ahead of time whether a proof on 
which you are working should be a direct proof or a proof by one of these other 
methods. Experience often allows for an educated guess as to which strategy to try 
first. In any case, if one strategy does not appear to bear fruit, then another strategy 
should be attempted. It is only when the proof is completed that we know whether a 
given choice of strategy works. 

Both strategies discussed in this section rely on ideas from our discussion of 
equivalence of statements in Section 1.3. For our first method, recall that the contra- 
positive of P — Q, the statement ~Q — —P, is equivalent to P — Q. Hence, in order 
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to prove P — Q, we could just as well prove ~=Q — —P, which we would do by the 
method of direct proof. We construct such a proof by assuming that Q is false, and 
then, in the final write-up, presenting a step-by-step argument going from —Q to —P. 
A proof of this sort is called proof by contrapositive. This type of proof typically 
has the following form. 


Proof. Suppose that Q is false. 


(argumentation) 


Then P is false. 


The following proof is a simple example of proof by contrapositive. 
Theorem 2.3.1. Let n be an integer. If n* is odd, then n is odd. 


Scratch Work. If we wanted to use a direct proof, we would have to start with the 
assumption that n” is odd. Then there would be some integer j such that n* = 2j+1. 
It is not clear, however, how to proceed from this point, so instead we try proof by 
contrapositive. Such a proof would involve assuming that n is not odd, which implies 
that it is even, and then deducing that n? is even, which implies that it is not odd. We 
start such a proof by observing that if is even, then there is some integer k such that 
n= 2k, and we then compute n? in terms of k, leading to the desired result. /// 


Proof. Suppose that n is even. Then there is some integer k such that n = 2k. Hence 
n? = (2k)? = 4k? = 2(2k?). Because 2k? is an integer, it follows that n? is even. By 
contrapositive, we see that if n2 is odd then n is odd. 


In the above proof we mentioned that we used proof by contrapositive. In general, 
it is often helpful to the reader to have the method of proof stated explicitly. 

Another method of proof for theorems with statements of the form P — Q, which 
looks similar to proof by contrapositive but is actually distinct from it, is proof by 
contradiction. 

Logicians use the term “proof by contradiction” to mean the proof of a statement 
A by assuming —A, then reaching a contradiction, and then deducing that A must be 
true. For our purposes, we are interested in proof by contradiction for the special 
case where the statement A has the form P — Q, because that is how mathematical 
theorems are formulated. We now take a closer look at this particular type of proof 
by contradiction. 

Recall from Section 1.3 that =(P — Q) is equivalent to P \ =Q. Suppose that we 
could prove that P \ 7Q is false. It would follow that =(P — Q) is false, and hence 
that =(4(P — Q)) is true. Then, using Double Negation (Fact 1.3.2 (1)), we could 
conclude that P — Q is true. 

The method of proof by contradiction is to show that P — Q is true by assuming 
that P \ =Q is true, and then deriving a logical contradiction, by which we mean, as 
discussed in Section 1.2, a statement that cannot be true under any circumstances; 
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often such statements have the form B/ —B for some statement B. Once we reach a 
contradiction, we conclude that P \ -Q is false, and then as above we deduce that 
P — Qis true. 

Another way to think of proof by contradiction is to observe from the truth table 
for P — Q that the only way for this statement to be false is if P is true and Q is false, 
that is, if P is true and —Q is true. Hence, if we assume both of these, and then derive 
a contradiction, we would know that P — Q cannot be false; hence P — Q must be 
true. 

A proof by contradiction typically has the following form. 


Proof. We prove the result by contradiction. Suppose that P is true and that Q is 
false. 


(argumentation) 


We have therefore reached a contradiction. Therefore P implies Q. 


We now turn to a simple example of proof by contradiction. It is a good idea to 
start such a proof by stating that you are using this strategy. 


Theorem 2.3.2. The only consecutive non-negative integers a, b and c that satisfy 
a+b? = are 3, 4and5. 

Scratch Work. The statement of this theorem has the form P — Q, because it can be 
restated as “if a, b and c are consecutive non-negative integers such that a* + b? = c’, 
then a, b and c are 3, 4 and 5.” It is hard to prove the result directly, because we are 
trying to prove that something does not exist. Rather, we will assume that consecutive 
integers a, b and c, other than 3, 4 and 5, exist and satisfy a? +b? = c*, and we 
will then derive a contradiction. Also, we observe that if a, b and c are consecutive 
integers, then b =a+1 andc=a+2. /// 


Proof. We prove the result by contradiction. Suppose that a, b and c are non-negative 
consecutive integers other than 3, 4 and 5, and that a? +b? = c*. Because a, b and 
c are not 3, 4 and 5, we know that a 4 3, and because the three numbers are con- 
secutive, we know that b= a+1 andc=a+2. From a2 +b? = c2 we deduce that 
a’ +(a+1)* = (a+2)?. After expanding and rearranging we obtain a” — 2a —3 = 0. 
This equation factors as (a —3)(a+1) =0. Hence a = 3 or a= —1. We have al- 
ready remarked that a 4 3, and we know a is non-negative. Therefore we have a 
contradiction, and the theorem is proved. 


Our next two theorems are both famous results that have well-known proofs by 
contradiction. These clever proofs are much more difficult than what we have seen 
so far, and are more than would be expected of a student to figure out on her own at 
this point. 

Our first result involves irrational numbers, which we will shortly define. Irra- 
tional numbers are a type of real number, and so we need to assume informal knowl- 
edge of the real numbers, just as we assumed informal knowledge of the integers 
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in Sections 2.1 and 2.2. The real numbers are the collection of all the numbers that 
are generally used in elementary mathematics (not including the complex numbers), 
and they have operations addition, subtraction, multiplication and division, and these 
operations satisfy standard properties such as the Commutative Law for addition and 
multiplication. See the Appendix for a brief summary of some of the standard prop- 
erties of real numbers. We now turn to the matter at hand. 


Definition 2.3.3. Let x be areal number. The number x is a rational number if there 
exist integers n and m such that m # 0 and x = “. If x is not a rational number, it is 
an irrational number. A 


Observe that if x is a rational number, then there are many different fractions of 
the form 4 such that x = “. Given any fraction # such that n 4 0, we can always 
reduce it to “lowest terms,” by which we mean that the numerator and denominator 
have no common factors other than 1 and —1. See the Appendix for a reference, 
where this fact about rational numbers is stated as Theorem A.6. 

Are there any irrational numbers? Though it is not at all obvious, there are in fact 
infinitely many of them, and in a certain sense there are more irrational numbers than 
rational ones, as will be made precise in Section 6.7. 

At this point, however, we will have to be satisfied with verifying that irrational 
numbers exist. In particular, we will prove that J2 is an irrational number. To us this 
fact may seem rather innocuous, though when first discovered it was something of a 
shock. The result was discovered by someone in the Pythagorean school in ancient 
Greece (possibly the sixth century B.C.E.). This school, centered around the figure 
of Pythagoras, was dedicated to mathematics as well as various mystical beliefs. 
Among other things, the Pythagoreans believed in the importance of whole numbers, 
and held that anything meaningful in the universe could be related to whole numbers 
or to ratios of whole numbers. The ancient Greeks tended to think of numbers geo- 
metrically, and they probably did not think of \/2 as an algebraically defined object, 
as we do today. However, by using the Pythagorean Theorem, we see that if a square 
has sides of length 1, then the diagonal of the square will have length 2. Hence 
V2 would be a geometrically meaningful number to the Pythagoreans, and therefore 
they were very disturbed to discover that this number was not expressible as a ratio 
of whole numbers. One legend has it that the discoverer of this fact, in despair, threw 
himself overboard from a ship. 

Before we state and prove our theorem about V2, we need a proper definition for 
this number. 


Definition 2.3.4. Let p be a positive real number. The square root of p, denoted 
\/P; 18 a positive real number x such that r= D. A 


Our goal is to prove that /2 is an irrational number, but there is a more funda- 
mental question about s/2 that needs to be addressed first, which is whether it exists. 
Definition 2.3.4 states that if there is a number denoted /2, it would be a positive 
real number x such that x” = 2, but nothing in the definition guarantees that such a 
number x exists. Clearly, if there is no such real number x, it would make no sense 
to try to prove that such a number is irrational. In fact, as expected, it is indeed true 
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that there is a positive real number x such that x* = 2 (and there is only one such 
number), but unfortunately it is beyond the scope of this book to give a proof of that 
fact. The proof requires tools from real analysis; see [Blol1, Theorem 2.6.9] for a 
proof. 

Assuming that J/2 exists, however, we can prove here that this number is irra- 
tional. Observe that the following theorem is self-contained, and does not rely upon 
a proof that VJ2 exists; it only says that if J2 exists, then it is irrational. 


Theorem 2.3.5. There is no rational number x such that x? = 2. 


Preliminary Analysis. The statement of our theorem says that something does not 
exist, which is hard to prove directly. However, we can easily reformulate the state- 
ment to avoid that problem, because to say that there is no rational number with a 
certain property means that if a real number has that property, that number cannot 
be rational. That is, we can reformulate our theorem as “if x is a real number and 
x? = 2, then x is irrational,” which has the familiar form P > Q. We then use proof 
by contradiction, which we start by assuming that x is a real number such that x* = 2, 
and also that x is not irrational (and hence it is rational). /// 


Proof. Let x be a real number. Suppose that x* = 2, and that x is rational. We will 
derive a contradiction. Because x is rational, there are integers n and m such that 
x= “. Observe that n 4 0. If 7 is not in lowest terms, then we could cancel any 
common factors, bringing it to lowest terms. There is no problem assuming that this 
has been done already, and so we may assume that n and m have no common factors 
other than | and —1. 

Because x2 = 2, then Ae = 2. It follows that a = 2, and hence n? = 2m”. We 
now ask whether 7 is even or odd. If n were odd, then using Exercise 2.2.4 we would 
see that n2 would be odd. This last statement is not possible, because n? =2m?, and 
2m? must be even, because it is divisible by 2. It follows that n cannot be odd; hence n 
must be even. Therefore there is some integer k such that n = 2k. Then (2k)? = 2m’, 
so that 4k? = 2m”, and therefore 2k” = m”. By an argument similar to the one used 
above, we see that m is even. We therefore conclude that both n and m are even. We 
have therefore reached a contradiction, because any two even numbers have 2 as a 
common factor, and yet we assumed that n and m have no common factors other than 
1 and —1. Hence x is not rational. 


The proof of Theorem 2.3.5 is mentioned (without details) in Aristotle’s “Prior 
Analytics” (1.23), and is presumed to be of earlier origin; perhaps it is the proof used 
by the Pythagoreans (though they would not have formulated it as we do). 

Our second famous result involves prime numbers, and has a proof by contradic- 
tion for a subpart of a proof by contradiction. We will make use of the definition of 
divisibility given in Section 2.2. 


Definition 2.3.6. Let p be an integer greater than 1. The number p is a prime num- 
ber if the only positive integers that divide p are | and p. The number p is a com- 
posite number if it is not a prime number. A 
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The first few prime numbers are 2,3,5,7,11,.... The study of prime numbers is 
quite old and very extensive; see any book on elementary number theory, for example 
[RosO5], for details. 

The number | is not considered to be either prime or composite. On the one hand, 
the only positive integers that divide | are | and itself, which would make it seem as 
if 1 were a prime number. However, the prime numbers are always defined as being 2 
or larger to avoid special cases and awkward statements of theorems. For example, if 
1 were a prime number, then the factorization of integers into prime numbers would 
not be unique, and uniqueness would hold only for “factorization into prime numbers 
other than 1,” which is cumbersome to state. On the other hand, the number | is not 
considered composite, because there are no positive integers other than | and itself 
that divide it. 

Whereas we restrict our attention to the integers greater than | when we discuss 
prime numbers and composite numbers, some authors consider negative numbers 
such as —2,—3,—5,... to be prime numbers, and similarly for composite numbers. 
Moreover, the term “prime” is used in the more general context of rings, a structure 
that is studied in abstract algebra, and that includes the integers as a special case; see 
any introductory abstract algebra text, for example [Fra03], for details. 

Observe that a composite number n can always be written as n = ab for some 
positive integers a and b such that | <a<nand1l<b<n. 

How many prime numbers are there? In particular, are there only finitely many 
prime numbers, or infinitely many? The following theorem answers this question. 
The proof we give is very commonly used, and goes back to Euclid; see [Rib96, 
Chapter 1] for further discussion, as well as some other nice proofs of this theorem. 


Theorem 2.3.7. There are infinitely many prime numbers. 


Preliminary Analysis. We have not yet seen a rigorous treatment of what it means 
for there to be infinitely many of something, and so for now we need to use this 
concept in an intuitive fashion. A thorough discussion of finite vs. infinite is found 
in Chapter 6. The essential idea discussed in that chapter is that if a collection of 
objects can be listed in the form a,a2,...,a, for some positive integer n, then the 
collection of objects is finite; if the collection of objects cannot be described by any 
such list, then it is infinite. In Chapter 6 we will see a rigorous formulation of this 
idea in terms of sets and functions, but this intuitive explanation of finite vs. infinite 
completely captures the rigorous definition. 

To say that there are infinitely many prime numbers means that there is no list 
of the form P;,P2,...,P,, for any positive integer n, that contains all prime numbers. 
It is easier to prove this statement if we reformulate it as “if n is a positive integer, 
and P,,P2,...,P, are prime numbers, then P;,P,...,P, does not include all prime 
numbers.” The proof of this last statement is by contradiction. /// 


Proof. Let n be a positive integer, and let P|, P2,...,P, be a collection of prime num- 
bers. Suppose that P;,P,...,P, contains all prime numbers. 
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Let Q = (P, x P) x --- x P,) + 1. We will show that Q is a prime number. Be- 
cause Q is clearly larger than any of the numbers P;, P;,...,P,, it will follow that Q 
is a prime number that is not in the collection P,,P:,...,P,, and we will therefore 
know that the collection P|, P2,...,P, does not contain all prime numbers, which is 
a contradiction. It will then follow that if 7 is a positive integer, and P,, P2,...,P, are 
prime numbers, then P;,P2,...,P, does not include all prime numbers, and we will 
conclude that there are infinitely many prime numbers. 

To show that Q is a prime number, we use proof by contradiction. Suppose that 
Q is not a prime number. Therefore Q is a composite number. By Theorem 6.3.10 we 
deduce that Q has a factor that is a prime number. (Though this theorem comes later 
in the text, because it needs some tools we have not yet developed, it does not use 
the result we are now proving, and so it is safe to use.) The only prime numbers are 
P,,P2,...,P,, and therefore one of these numbers must be a factor of Q. Suppose that 
P, is a factor of Q, for some integer k such that 1 < k <n. Therefore there is some 
integer R such that P.R = Q. Hence 


P,R = (Pi X Py X +++ P,) +1, 


and therefore 
P,[R = (Py X +++ X Prey X Peg X +++ X P,)] = 1. 


It follows that P, divides 1. However, the only integers that divide | are | and —1. 
(We will not provide a proof of this last fact; it is stated as Theorem A.4 in the 
Appendix.) Because P; is a prime number it cannot possibly equal | or —1, which is 
a contradiction. We deduce that Q is not a composite number, and hence it is a prime 
number. 


The proof of Theorem 2.3.7 actually yields more than just what the statement of 
the theorem says; it in fact gives an explicit procedure for producing arbitrarily many 
prime numbers. We start by letting P; = 2, which is the smallest prime number. We 
then let P) = P; + 1 = 3, and then P3 = (P, x P)) + 1 =7, and then Py = (P, x P) x 
P3) + 1 = 43, and so on. We could continue this process indefinitely, producing as 
many prime numbers as we liked. This process is not entirely satisfying, however, 
both because it does not yield a simple explicit formula for P, as a function of n, and 
also because this process skips over many prime numbers. In fact, no one has yet 
found a simple procedure to produce all prime numbers. 

We conclude this section with the observation that proof by contradiction implic- 
itly uses Double Negation, which ultimately relies upon the Law of the Excluded 
Middle, which says that any statement is either true or false. (See Section 1.2 for 
more discussion of this issue.) Any mathematician who does not believe in the Law 
of the Excluded Middle would therefore object to proof by contradiction. There are 
such mathematicians, though the majority of mathematicians, including the author 
of this book, are quite comfortable with the Law of the Excluded Middle, and hence 
with proof by contradiction. 
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Exercises 


Exercise 2.3.1. For each of the statements in Exercise 2.2.1, outline the strategy for 
a proof by contrapositive, and the strategy for a proof by contradiction (do not prove 
the statements, because the terms are meaningless). 


Exercise 2.3.2. Let n be an integer. Prove that if n2 is even, then n is even. 


Exercise 2.3.3. Let a, b and c be integers. Prove that if a does not divide bc, then a 
does not divide b. 


Exercise 2.3.4. [Used in Theorem 6.7.4.] Prove that the product of a non-zero rational 
number and an irrational number is irrational. 


Exercise 2.3.5. Let a, b and c be integers. Suppose that there is an integer d such 
that d|a and d|b, but that d does not divide c. Prove that the equation ax + by = c has 
no solution such that x and y are integers. 


Exercise 2.3.6. Let c be an integer. Suppose that c > 2, and that c is not a prime 
number. Prove that there is an integer b such that b > 2, that blc and that b < \/c. 


Exercise 2.3.7. Let g be an integer. Suppose that g > 2, and that for any integers a 
and b, if q|ab then q|a or q|b. Prove that ,/@ is irrational. 


Exercise 2.3.8. Let g be an integer. Suppose that g > 2, and that for any integers 
a and b, if q|ab then q\a or q|b. Prove that g is a prime number. (The converse to 
this statement is also true, though it is harder to prove; see [Dea66, Section 3.6] for 
details, though note that his use of the term “prime,” while keeping with the standard 
usage in ring theory, is not the same as ours.) 


2.4 Cases, and If and Only If 


The notion of equivalence of statements, as discussed in Section 1.3, has already 
been seen to be useful in proving theorems, for example in proof by contrapositive. 
In this section we will make use of some other equivalences of statements to prove 
certain types of theorems. 

One commonly used method for proving a statement of the form P — @Q is by 
breaking up the proof into a number of cases (and possibly subcases, subsubcases 
and so on). Formally, we use proof by cases when the premise P can be written in the 
form A V B. We then use Exercise 1.3.2 (6) to see that (A VB) — Q is equivalent to 
(A > Q)A(B— Q). Hence, in order to prove that a statement of the form (A V B) > 
Qis true, it is sufficient to prove that each of the statements A — Q and B > Qis true. 
The use of this strategy often occurs when proving a statement involving a quantifier 
of the form “for all x in U,” and where no single proof can be found for all such x, but 
where U can be divided up into two or more parts, and where a proof can be found 
for each part. 

For the following simple example of proof by cases, recall the definition of even 
and odd integers in Section 2.1. 
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Theorem 2.4.1. Let n be an integer. Then n* +n is even. 


Preliminary Analysis. Because we know about sums and products of even numbers 
and odd numbers, it seems like a good idea to try breaking up the proof into two 
cases, one case where n is even and one case where n is odd. Formally, let A = 
“n is an even integer,” let B = “n is an odd integer” and let Q = “n“ +n is even.” 
Then the theorem has the form (A V B) — Q. We will prove the theorem by proving 
that (A — Q) and (B — Q) are both true; each of these statements will be proved 
as a separate case. The proof of this theorem could be done either by making use 
of Theorem 2.1.3 and Exercise 2.2.4, or from scratch; because the latter is simple 
enough, we will do that. /// 


Proof. Case 1: Suppose that n is even. By definition we know that there is some 
integer k such that n = 2k. Hence 


n? +n = (2k)? +2k = 4k? + 2k = 2(2k? +h). 


Because k is an integer, so is 2k* +k. Therefore n? +n is even. 
Case 2: Suppose that n is odd. By definition we know that there is some integer j 
such that n = 27+ 1. Hence 
n +n=(2j+1)? + (2j+1) = (47 +4741) + (2/4) 
=47?+6j4+2=2(277+3j+1). 


Because j is an integer so is 2j* + 3j +1. Therefore n? +n is even. 


It is not really necessary to define A and B explicitly as we did in the scratch work 
for Theorem 2.4.1, and we will not do so in the future, but it was worthwhile doing 
it once, just to see how the equivalence of statements is being used. 

In the proof of Theorem 2.4.1 we had two cases, which together covered all pos- 
sibilities, and which were exclusive of each other. It is certainly possible to have more 
than two cases, and it is also possible to have non-exclusive cases; all that is needed 
is that all the cases combined cover all possibilities. The proof of Theorem 2.4.4 
below has two non-exclusive cases. 

We now turn to theorems that have statements of the form P — (A VB). Such 
theorems are less common than the previously discussed type, but do occur, and it is 
worth being familiar with the standard proof strategies for such theorems. There are 
two commonly used strategies, each one being advantageous in certain situations. 
One approach would be to use the contrapositive together with De Morgan’s Law 
(Fact 1.3.2 (13)), which together imply that P — (AV B) is equivalent to (=A AB) > 
=P. The other would be to use Exercise 1.3.2 (5), which says that P — (A V B) is 
equivalent to (P \7A) — B. The roles of A and B could also be interchanged in this 
last statement. The second approach is more commonly used, and so we use it in the 
following proof, although in this particular case the first approach would work quite 
easily, as the reader should verify. 
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Theorem 2.4.2. Let x and y be real numbers. If xy is irrational, then x or y is irra- 
tional. 


Preliminary Analysis. The statement of this theorem has the form P — (AV B). We 
will prove (PA 7A) — B, which we do by assuming that xy is irrational and that x is 
rational, and deducing that y is irrational. /// 


Proof. Suppose that xy is irrational and that x is rational. Hence x = | for some 

integers a and b such that b £ 0. We will show that y is irrational, by using proof by 

contradiction. Suppose that y is rational. It follows that y = ™ for some integers m 
am 


and n such that n £ 0. Hence xy = $", and bn # 0, contradicting the fact that xy is 
. . . . . te 
irrational. Hence y is irrational. 


Having discussed the appearance of V in the statements of theorems, we could 
also consider the appearance of /\, though these occurrences are more straightfor- 
ward. As expected, a theorem with statement of the form (A A B) — Q is proved by 
assuming A and B, and using both of these statements to derive Q. To prove a theo- 
rem with statement of the form P — (A AB), we can use Exercise 1.3.2 (4), which 
states that P — (A A B) is equivalent to (P — A) A (P — B). Hence, to prove a the- 
orem with statement of the form P — (A/B), we simply prove each of P — A and 
P — B, again as expected. 

Not only are there a variety of ways to structure proofs, but there are also variants 
in the logical form of the statements of theorems. Whereas the most common logical 
form of the statement of a theorem is P — Q, as we have discussed so far, another 
common form is P «> Q. We refer to such theorems as “if and only if” theorems 
(often abbreviated “iff” theorems). To prove such a theorem, we make use of the fact 
that P — Q is equivalent to (P + Q) A(Q — P), as was shown in Fact 1.3.2 (11). 
Hence, to prove a single statement of the form P + Q, it is sufficient to prove the two 
statements P — Q and Q — P, each of which can be proved using any of the methods 
we have seen so far. We now give a typical example of such a proof; it is sufficiently 
straightforward so that we dispense with the scratch work. Recall the definition of 
divisibility of integers in Section 2.2. 


Theorem 2.4.3. Let a and b be non-zero integers. Then a\b and bja if and only if 
a=bora=~—b. 


Proof. 


=. Suppose that alb and bla. Because a|b, there is some integer m such that 
am = b, and because bla, there is some integer k such that bk = a. Substituting this 
last equation into the previous one, we obtain (bk)m = b, and hence b(km) = b. 
Because b ¥ 0, it follows that km = 1. Because k and m are integers, then either 
k= 1 and m= 1, ork = —1 and m= —1. (We will not provide a proof of this last 
fact; it is stated as Theorem A.4 in the Appendix.) In the former case a = b, and in 
the latter case a = —b. 


<. Suppose that a = b ora = —b. First, suppose that a = b. Then a-1 = b, so 
a\b, and b- 1 =a, so bja. Similarly, suppose that a= —b. Then a- (—1) = J, so alb, 
and b-(—1) =a, so bia. 
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Our next example of an if and only if theorem combines a number of the methods 
we have discussed so far. 


Theorem 2.4.4. Let m and n be integers. Then mn is odd if and only if both m and n 
are odd. 


Scratch Work. The “<=” part of this theorem, which is the “if” part, says that if m 
and n are both odd, then mn is odd. This implication will be straightforward to prove, 
using the definition of odd integers. 

The “=>” part of this theorem, which is the “only if” part, says that if mn is odd, 
then both m and n are odd. A direct proof of this part of the theorem would start with 
the assumption that mn is odd, which would mean that mn = 2p + | for some integer 
p, but it is not clear how to go from there to the desired conclusion. It is easier to 
make assumptions about m and n and proceed from there, so we will prove this part 
of the theorem by contrapositive, in which case we assume that m and n are not both 
odd, and deduce that mn is not odd. When we assume that m and n are not both odd, 
we will have two (overlapping) cases to consider, namely, when m is even or when 
n is even. Alternatively, it would be possible to make use of three non-overlapping 
cases, which are when m is even and n is odd, when m is odd and n is even, and 
when m and n are both even; however, the proof is no simpler as a result of the non- 
overlapping cases, and in fact the proof would be longer with these three cases rather 
than the two overlapping ones as originally proposed, and so we will stick with the 


latter. /// 
Proof. 


<. Suppose that m and n are both odd. Hence there is an integer j such that 
m=2j-+-1, and there is an integer k such that n = 2k+ 1. Therefore 


mn = (2j+1)(2k+1) =4jk+2j+2k+1=2(2jk+j+k)+1. 


Because k and j are integers, so is 2jk+ j-+k. Therefore mn is odd. 


=>. Suppose that m and n are not both odd. We will deduce that mn is not odd, 
and the desired result will follow by contrapositive. If m and n are not both odd, then 
at least one of them is even. Suppose first that m is even. Then there is an integer p 
such that m = 2p. Hence mn = (2p)n = 2(pn). Because p and n are integers, so is 
pn. Therefore mn is even. Next assume that 7 is even. The proof in this case is similar 
to the previous case, and we omit the details. 


A slightly more built-up version of an if and only if theorem is a theorem that 
states that three or more statements are all mutually equivalent. Such theorems often 
include the phrase “the following are equivalent,’ sometimes abbreviated “TFAE.” 
The following theorem, which involves 2 x 2 matrices, is an example of this type of 
result. For the reader who is not familiar with matrices, we summarize the relevant 
notation. A 2 x 2 matrix is a square array of numbers of the form M = (3 AP for 
some real numbers a, b, c and d. The determinant of such a matrix is defined by 
detM = ad — bc, and the trace of the matrix is defined by tM = a+d. An upper 
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triangular 2 x 2 matrix has the form ae for some real numbers a, b and d. See 
any introductory text on linear algebra, for example [ARO5, Chapters 1 and 2], for 


the relevant information about matrices. 


Theorem 2.4.5. Let M = (3 ) be an upper triangular 2 x 2 matrix. Suppose that a, 
band d are integers. The following are equivalent. 


a. detM = 1. 
b. a=d=xl. 
c. tM =+2 anda=d. 


What Theorem 2.4.5 says is that (a) if and only if (b), that (a) if and only if (c), 
and that (b) if and only if (c). Hence, to prove these three if and only if statements 
we would in principle need to prove that (a) = (b), that (b) = (a), that (a) = (c), that 
(c) = (a), that (b) = (c), and that (c) = (b). In practice we do not always need to 
prove six separate statements. The idea is to use the transitivity of logical implication, 
which follows from Fact 1.3.1 (12). For example, suppose that we could prove that 
(a) = (b), that (b) = (c), and that (c) = (a); the other three implications would then 
hold automatically. We could just as well prove that (a) = (c), that (c) = (b), and that 
(b) => (a), if that were easier. Another way to prove the theorem would be to prove 
that (a) = (b), that (b) = (a), that (a) = (c), and that (c) = (a). It is sufficient to prove 
any collection of logical implications from which the remaining logical implications 
can be deduced using transitivity; the choice of what to prove and what to deduce 
depends upon the particular theorem being proved. Similar reasoning holds when 
more than three statements are being proved equivalent. 


Proof of Theorem 2.4.5. We will prove that (a) = (b), that (b) = (c), and that (c) > 
(a). 
(a) > (b). Suppose that detM = 1. Hence ad — b-0 = 1, and therefore ad = 1. 


Because both a and d are integers, it must be the case that either a = | and d = 1, or 
a=-—1andd = -—1, using Theorem A.4. 


(b) => (c). Suppose that a = d = +1. First, suppose that a = d = 1. Then trM = 
a+d =2. Second, suppose that a= d = —1. Then trM = a+d = —2. Hence trM = 
+2 anda=d. 


(c) = (a). Suppose that tt M = +2 and a =d. We can rewrite tr M = +2 as 
a+d=+2. Hence 4 = (a+d)? =a’ +2ad + d’. Because a = d, then a = ad = d’, 
and therefore 4 = 4ad. It follows that ad = 1. Because detM = ad —b-0 = ad, we 
deduce that detM = 1. 


Exercises 


Exercise 2.4.1. Outline the strategy for a proof of each of the following statements 
(do not prove them, because the terms are meaningless). 


(1) If an integer is combustible then it is even or prime. 
(2) A2 x 2 matrix is collapsible if and only if its determinant is greater than 3. 
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(3) For an integer to be putrid, it is necessary and sufficient that it is both odd and 
divisible by 50. 

(4) Let 7 be an integer. The following are equivalent: (a) the integer n is com- 
posite and greater than 8; (b) the integer n is suggestive; (c) the integer n is 
indifferent or fragile. 


Exercise 2.4.2. Let a, b and c be integers. Suppose that c 4 0. Prove that a|b if and 
only if ac|be. 


Exercise 2.4.3. [Used in Exercise 4.4.8, Exercise 6.7.9 and Section 8.8.] Let a and b be 
integers. The numbers a and / are relatively prime if the following condition holds: 
if n is an integer such that nja and n|b, then n = +1. See Section 8.2 for further 
discussion and references. 


(1) Find two integers p and gq that are relatively prime. Find two integers c and d 
that are not relatively prime. 
(2) Prove that the following are equivalent. 
a. aand b are relatively prime. 
b. a and —D are relatively prime. 
c. a+band bare relatively prime. 
d. a—band bare relatively prime. 


Exercise 2.4.4. Let 1 be an integer. Prove that one of the two numbers n and n+ 1 is 
even, and the other is odd. (You may use the fact that every integer is even or odd.) 


Exercise 2.4.5. It follows from Corollary 5.2.5, using n = 3, that if a is an integer, 
then precisely one of the following holds: either a = 3k for some integer k, or a = 
3k-+ 1 for some integer k, or a = 3k+ 2 for some integer k. 

Let n and m be integers. 


(1) Suppose that 3 divides n, and that 3 does not divide m. Prove that 3 does not 
divide n+ m. 
(2) Prove that 3 divides mn if and only if 3 divides m or 3 divides n. 


Exercise 2.4.6. Are there any integers p such that p > 1, and such that all three 
numbers p, p+ 2 and p+ 4 are prime numbers? If there are such triples, prove that 
you have all of them; if there are no such triples, prove why not. Use the discussion 
at the start of Exercise 2.4.5. 


Exercise 2.4.7. Let n be an integer. Using only the fact that every integer is even 
or odd, and without using Corollary 5.2.5, prove that precisely one of the following 
holds: either n = 4k for some integer k, orn = 4k-+ | for some integer k, orn = 4k +2 
for some integer k, or n = 4k +3 for some integer k. 


Exercise 2.4.8. Let n be an integer. Suppose that n is odd. Prove that there is an 
integer k such that n? = 8k +1. 


Exercise 2.4.9. Let x be a real number. Define the absolute value of x, denoted |x 
by 


ry 
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Xx, if0<x 
|x| = , 
—x, ifx<0. 


Let a and b be real numbers. Prove the following statements. 


() |-a| =|). 3) |a—4| =[b—al, 
(2) |a\* =a’. (4) |ab| = \al|>]. 


Exercise 2.4.10. Let x and y be real numbers. Let x ~ y and x ~ y be defined by 


ifx > if x > 
xaAy= ” ies and xvy= # ded 
y, ifx<y, x, ifx<y. 


? 
(Observe that x ~ y is simply the maximum of x and y, and x ~ y is the mini- 
mum, though our notation is more convenient for the present exercise than writing 
max {x,y} and similarly for the minimum.) 

Let a, b and c be real numbers. Prove the following statements. The definition of 
absolute value is given in Exercise 2.4.9. 


(Ql) (anb)+(avb)=a+tb. 

(2) (a~ b) +c =(a+c) - (b+c) and (a~ b)+c= (atc) -(b+c). 
(3) (anb) \c=axn(boc)and (av b)vc=ar (bc). 

(4) (ab) - (av b) = |a—5. 

(5) an b=5(at+b+|a—D|) anda~ b= }(a+b—|a—d)). 


2.5 Quantifiers in Theorems 


A close look at the theorems we have already seen, and those we will be seeing, 
shows that quantifiers (as discussed in Section 1.5) appear in the statements of many 
theorems—implicitly if not explicitly. The presence of quantifiers, and especially 
multiple quantifiers, in the statements of theorems is a major source of error in the 
construction of valid proofs by beginners. So, extra care should be taken with the 
material in this section; mastering it now will save much difficulty later on. Before 
proceeding, it is worth reviewing the material in Section 1.5. Though we will not 
usually invoke them by name, to avoid distraction, the rules of inference for quanti- 
fiers discussed in Section 1.5 are at the heart of much of what we do with quantifiers 
in theorems. 

We start by considering statements with a single universal quantifier, that is, state- 
ments of the form “(Vx in U)P(x).’” Many of the theorems we have already seen have 
this form, even though the expression “for all” might not appear in their statements. 
For example, Theorem 2.3.1 says “Let n be an integer. If n* is odd, then n is odd.” 
This statement implicitly involves a universal quantifier, and it can be rephrased as 
“For all integers n, if n° is odd, then n is odd.” In order to prove that something is true 
for all integers, we picked an arbitrary integer that we labeled n (any other symbol 
would do), and proved the result for this arbitrarily chosen integer n. It was crucial 
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that we picked an arbitrary integer n, rather than a specific integer, for example 7. It 
is true that 7* = 49 is odd, and that 7 is odd, but checking this one particular case 
does not tell us anything about what happens in all the other cases where n is an 
integer with n? odd. 

More generally, suppose that we want to prove a theorem with statement of the 
form (Vx in U) P(x). The key observation is that the statement “(Vx in U)P(x)” is 
equivalent to “if x is in U, then P(x) is true.” This latter statement has the form 
A — B, and it can be proved by any of the methods discussed previously. A direct 
proof for (‘Vx in U)P(x) would therefore proceed by choosing some arbitrary xo in U, 
and then deducing that P(xo) holds. Phrases such as “let xo be in U” are often used 
at the start of an argument to indicate an arbitrary choice of xo. This type of proof 
typically has the following form. 


Proof. Let x9 be in U. 


(argumentation) 


Then P(xo) is true. 


Again, we stress that it is crucial in this type of proof that an arbitrary xo in 
U is picked, not some particularly convenient value. It is not possible to prove that 
something is true for all values in U by looking at only one (or more) particular 
cases. In terms of rules of inference, look closely at the discussion of the variable in 
the Universal Generalization rule of inference in Section 1.5. 

For example, a well-known function due to Leonhard Euler is defined by the 
formula f(n) = n?+n+41 for all integers n. If you substitute the numbers n = 
0,1,2,...,39 into this function, you obtain the numbers 41, 43, 47, ..., 1601, all 
of which are prime numbers. It therefore might appear that substituting in every 
positive integer into this function would result in a prime number (which would be 
a very nice property), but it turns out that f(40) = 1681 = 41°, which is not prime. 
See [Rib96, p. 199] for more discussion of this, and related, functions. The point is 
that if you want to prove that a statement is true for all x in U, it does not suffice to 
try only some of the possible values of x. 

Statements of the form (‘Vx in U)P(x) can be proved by strategies other than di- 
rect proof. For example, the proof of such a statement using proof by contradiction 
typically has the following form. 


Proof. We use proof by contradiction. Let yo be in U. Suppose that P(yo) is false. 


(argumentation) 


Then we arrive at a contradiction. 
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We will not show here any examples of proofs of statements of the form 
(Vx in U)P(x), because we have already seen a number of such proofs in the pre- 
vious sections of this chapter. 

We now consider statements with a single existential quantifier, that is, statements 
of the form “(Ax in U)P(x).” Using the Existential Generalization rule of inference in 
Section 1.5, we see that to prove a theorem of the form (Ax)P(x) means that we need 
to find some zo in U such that P(zo) holds. It does not matter if there are actually 
many x in U such that P(x) holds; we need to produce only one of them to prove 
existence. A proof of “(Sx in U)P(x)” can also be viewed as involving a statement 
of the form A — B. After we produce the desired object zo in U, we then prove the 
statement “if x = zo, then P(x) is true.” Such a proof typically has the following form. 


Proof. Let z=.... 
(argumentation) 
Then zo is in U. 


(argumentation) 


Then P(zo) is true. 


How we find the element zg in the above type of proof is often of great interest, 
and sometimes is the bulk of the effort we spend in figuring out the proof, but it is 
not part of the actual proof itself. We do not need to explain how we found zp in 
the final write-up of the proof. The proof consists only of defining zp, and showing 
that zo is in U, and that P(zo) is true. It is often the case that we find zo by going 
backwards, that is, assuming that P(zo) is true, and seeing what zo has to be. However, 
this backwards work is not the same as the actual proof, because, as we shall see, 
not all mathematical arguments can be reversed—what works backwards does not 
necessarily work forwards. 

We now turn to a simple example of a proof involving an existential quantifier. 
Recall the definitions concerning 2 x 2 matrices prior to Theorem 2.4.5. We say that 
a2 x2 matrix M = (2 B) has integer entries if a, b, c and d are integers. 


Proposition 2.5.1. There exists a2 x 2 matrix A with integer entries such that detA = 
4 and trA =7. 


Scratch Work. Let A = Ge Phe The condition detA = 4 means that ad — bc = 4; the 
condition trA = 7 means that a+d = 7. We have two equations with four unknowns. 
Substituting d = 7 —a into the first equation and rearranging, we obtain a? — 7a + 
(bc +4) = 0. Applying the quadratic equation yields 


7+ 33 —4bc 
5 : 


i — 
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Because we want a, b, c and d to be integers, we need to find integer values of b and 
c such that 33 — 4bc is the square of an odd integer. Trial and error shows that b = 2 
and c = 3 yield either a= 5 and d = 2, or a= 2 and d = 5S. (There are other possible 
solutions, for example b = —2 and c = 2, but we do not need them). /// 


Proof. LetA= e ae Then detA = 5-2—2-3=4, andtrA =5+4+2=7. 


The difference between the scratch work and the actual proof for the above 
proposition is quite striking, as often occurs in proofs of theorems involving exis- 
tential quantifiers. In the scratch work we went backwards, by which we mean that 
we started with the desired conclusion, in this case the assumption that there is some 
matrix A as desired, and proceeded to find out what criteria would then be imposed 
on a, b, c, d. We then found a, b, c, d that satisfy these criteria. Such a procedure was 
helpful, but it could not be our final proof, because we needed to show that the matrix 
A existed; we were not asked to show what could be said about A if it existed, which 
is what we did in the scratch work. To show that the desired matrix A existed, we 
simply had to produce it, and then show that it satisfied the requisite properties re- 
garding its determinant and trace. This is what we did in the proof. How we produced 
A is irrelevant to the final proof (though not to our understanding of matrices). It is 
important that the actual proof reads “forwards,” not backwards. Moreover, because 
we were asked to show only that A existed, and not describe how many possible ma- 
trices A there were, we needed to exhibit only one value of A in the actual proof, even 
though we knew that there was more than one possibility from our scratch work. Not 
everything we learn in the scratch work is necessarily needed in the final proof. 

Backwards proofs are so common, especially in elementary mathematics, that 
unfortunately they are often unnoticed by students, and rarely criticized by instruc- 
tors. Whereas backwards proofs might not produce any real harm in elementary 
mathematics, it is crucial to avoid them in advanced mathematics, where questions 
of logical implication are often much trickier. 

Let us examine two simple examples of backwards proofs. First, suppose that we 
are asked to solve the equation 7x + 6 = 21 + 4x. A typical solution submitted by a 
high school student might look like 


7x+6=21+4x 

3x—-15=0 (2.5.1) 
3x = 15 
x=; 


There is nothing wrong with the algebra here, and indeed x = 5 is the correct solution. 
For computational purposes such a write-up is fine, but logically it is backwards. We 
were asked to find the solutions to the original equation. A solution to an equation is 
a number that can be plugged into the equation to obtain a true statement. To solve 
an equation in the variable x, we simply have to produce a collection of numbers, 
which we then plug into the equation one at a time, verifying that each one makes 
the equation a true statement when plugged in. How these solutions are found is 
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logically irrelevant (though, of course, of great pedagogical interest). A logically 
correct “forwards” write-up of the solution to 7x + 6 = 21 + 4x would be as follows. 


“Let x = 5. Plugging x = 5 into the left-hand side of the equation yields 7x + 
6=7-5+6= 41, and plugging it into the right-hand side of the equation yields 
21+4x = 2144-5 = 41. Therefore x = 5 is a solution. Because the equation is 
linear, it has at most one solution. Hence x = 5 is the only solution.” 


Such a write-up seems ridiculously long and overly pedantic, given the simplic- 
ity of the original equation, and in practice no one would (or should) write such a 
solution. Logically, however, it is the correct form for the solution to the problem as 
stated. The backwards approach in Equation 2.5.1 did happen to produce the correct 
solution to our problem, because all steps in this particular case are reversible. Not 
all computations are reversible, however, as we now see. 

Suppose that we are asked to solve the equation 


Vx2—5=Vx4+l, 


where, as is common in high school, we consider only real number solutions. A 
typical (and backwards) write-up might look like 


J/e—S5=Vx41 


x—5=x+1 
x =2-6=0 
(x —3)(x+2) =0 
x=3 or x=-2. 
The above write-up is definitely not correct, because x = —2 is not a solution to the 
original equation. In fact, itis not even possible to substitute x = —2 into either side of 


the original equation, because we cannot take the square root of negative numbers. 
The source of the error in the write-up is that not every step in it is reversible; it 
is left to the reader to figure out which step cannot be reversed. In an elementary 
course such as high school algebra or calculus, it would suffice to write up the above 
computation, and then observe that x = —2 should be dropped. In more rigorous 
proofs, however, it is best to stick to logically correct writing, in order to avoid errors 
that might otherwise be hard to spot. In your scratch work you can go forwards, 
backwards, sideways or any combination of these; in the final write-up, however, a 
proof should always go forwards, starting with the hypothesis and ending up with 
the desired conclusion. 

Returning to our discussion of existence results, one variant on such results con- 
cerns theorems that involve existence and uniqueness, of which the following the- 
orem is an example. This theorem concerns 2 x 2 matrices, as discussed prior to 
Theorem 2.4.5. This time we need some additional aspects of matrices, namely, the 
2 x 2 identity matrix J = a Ts and matrix multiplication. It would take us too far 
afield to define matrix multiplication here; we assume that the reader is familiar 
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with such multiplication. See any introductory text on linear algebra, for example 
[AROS, Chapter 1], for information about matrix multiplication. It is easy to verify 
that AJ = A = JA for any 2 x 2 matrix A. It can also be verified (by a slightly tedious 
computation) that (AB)C = A(BC) for any three 2 x 2 matrices A, B and C. 

The following theorem concerns inverse matrices. Given a 2 x 2 matrix A, an 
inverse matrix for A is a 2 x 2 matrix B such that AB = J = BA. Does every 2 x 2 
matrix have an inverse matrix? The answer is no. For example, the matrix a) has 
no inverse matrix, as the reader may verify (by supposing it has an inverse matrix, 
and seeing what happens). The following theorem gives a very useful criterion for 
the existence of inverse matrices. In fact, the criterion is both necessary and sufficient 
for the existence of inverse matrices, and its analog holds for square matrices of any 
size, but we will not prove these stronger results. 


Theorem 2.5.2. Let A be a 2 x 2 matrix such that detA # 0. Then A has a unique 
inverse matrix. 


The phrase “A has a unique inverse matrix” means that an inverse matrix for A 
exists, and that only one such inverse matrix exists. The logical notation for such a 
statement is (S!x)P(x), where “S!x” means “there exists unique x.” To prove such 
a statement, we need to prove two things, namely, existence and uniqueness, and it 
is usually best to prove each of these two things separately. It makes no difference 
which part is proved first. To prove existence, we proceed as before, and produce 
an example of the desired object. To prove uniqueness, the standard strategy is to 
assume that there are two objects of the sort we are looking for, and then show that 
they are the same. (It is also possible to assume that there are two different objects of 
the sort we are looking for, and then arrive at a contradiction by showing that the two 
object are actually the same, but there is rarely any advantage to using this alternative 
strategy.) 


Scratch Work for Theorem 2.5.2. We start with the uniqueness part of the proof, 
to show that it really is independent of the existence part of the proof. To prove 
uniqueness, we assume that A has two inverse matrices, say B and C, and then use the 
properties of matrices cited above, together with the definition of inverse matrices, to 
show that B = C. The proof of existence is rather different. A backwards calculation 
to try to find an inverse matrix for A would be as follows. Let A = es z ). Suppose 


cd 


that B = (2 },) is an inverse matrix of A. Then BA = / and AB =. The latter equality 


says 
ab\ (xy\_ (10 
cd}\zw) \O1)’ 


ax+bzay+bw\ _ (10 
cx+dzcyt+tdw)” \01/)° 


This matrix equation yields the four equations 


which yields 


ax+bz=1 


ay+bw=0 
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cx+dz=0 
cy+dw=1, 


where x, y, z and w are to be thought of as the variables and a, b, c and d are to be 
thought of as constants. We then solve for x, y, z and w in terms of a, b, c and d. 
The solution to these four equations turns out to be x = —. and y = eb. and 
ad—bc ad—bc 
= 


2= aah and w = adhe" Because detA = ad — bc, we see why the hypothesis that 
detA # 0 is necessary. he 


a 


Proof of Theorem 2.5.2. Uniqueness: Suppose that A has two inverse matrices, say 
B and C. Then AB = I = BA and AC = [= CA. Using standard properties of matrix 
multiplication, we then compute 


B = BI = B(AC) = (BA)C = IC =C. 


Because B = C, we deduce that A has a unique inverse. 
Existence: Let A = (s : e The condition detA 4 0 means that ad — bc # 0. Let B be 


cd 


the 2 x 2 matrix defined by 
d —b 
B= Gz = | ; 
ad—bce ad—bc 


b d =b ad, —be —ab ab 
AB = a ad—bc ad—bc \ — | ad—bce ad—bc ad—be ad—bce 
~l\ed =c a = cd, ~-cd _—be |, _ad 
ad—bce ad—bc ad—be ad—bc ad—bc ad—bce 


Then 


A similar calculation shows that BA = J. Hence B is an inverse matrix of A. 


An understanding of quantifiers is also useful when we want to prove that a 
given statement is false. Suppose that we want to prove that a statement of the form 
“(Vx in U)P(x)” is false. We saw in Section 1.5 that —[(Vx in U)Q(x)] is equivalent 
to (Sx in U)(=Q(x)). To prove that the original statement is false, it is sufficient to 
prove that (rv in U)(—Q(x)) is true. Such a proof would work exactly the same as 
any other proof of a statement with an existential quantifier, that is, by finding some 
xo in U such that =Q(xo) is true, which means that Q(x0) is false. The element xq is 
called a “counterexample” to the original statement (Vx in U) P(x). 

For example, suppose that we want to prove that the statement “all prime numbers 
are odd” is false. The statement has the form (Vx)Q(x), where x has values in the 
integers, and where Q(x) = “if x is prime, then it is odd.” Using the reasoning above, 
it is sufficient to prove that (Ax) (-Q(x)) is true. Using Fact 1.3.2 (14), we see that 
Q(x) is equivalent to “x is prime, and it is not odd.” Hence, we need to find some 
integer xo such that xo is prime, and it is not odd, which would be a counterexample 
to the original statement. The number xo = 2 is just such a number (and in fact it is 
the only even prime number, though we do not need that fact). This example is so 
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simple that it may seem unnecessary to go through a lengthy discussion of it, but our 
point is to illustrate the general approach. 

Similar considerations can be used to prove that a statement of the form (Ay)R(y) 
is false. It is often very hard to show directly that something does not exist, be- 
cause one would have to examine all possible cases, and show that none of them 
have the desired property. Rather, we use the fact that =[(4y)R(y)] is equivalent to 
(Vy)(>R(y)), and we prove this last statement by our usual methods. 

Finally, we look at theorems with statements that involve more than one quanti- 
fier. Such theorems might typically have the form (Vy) (ax) P(x, y) or (Sa) (Vb) Q(a,b). 
We saw in Section 1.5 that there are eight possible ways of forming statements with 
two quantifiers, and clearly with more than two quantifiers there are many more pos- 
sibilities. There is no point in giving detailed instructions on how to proceed for each 
different combination of quantifiers, both because there would be too many cases to 
consider, and because one single strategy works in all cases: take one quantifier at a 
time, from the outside in. The following two simple results are typical examples of 
this strategy. 


Proposition 2.5.3. For every real number a, there exists a real number b such that 
a—b’+4=0. 


Scratch Work. This proposition has the form (Va)(4b) (a? — b? +4 = 0), where a 
and b are real numbers. To prove this proposition, we start with the outside quantifier, 
which is Va. We can rewrite the statement to be proved as (Va)Q(a), where Q(a) = 
“(4b) (a* — b? +4 = 0).” To prove the statement (Va)Q(a), which is a statement with 
a single universal quantifier, we proceed as before, namely, by picking an arbitrary 
real number ao, and then showing that Q(ao) holds. Therefore we need to show that 
5b) ((ao)* — b? + 4 = 0) is true for the given ap. Again, we have a statement with 
one quantifier, this time an existential quantifier, and we do a backwards computation 
to solve for b, which yields b = +,/(ao)* +4, though we need only one of these 
solutions. As always, we now write the proof forwards, to make sure that everything 
is correct. /// 


— 


— 


Proof. Let ao be a real number. Let by = \/ (ao)? +4. Then 


(ao)? — (bo)? +4 = (ao)? = (/ (ao? +4)? +4 =0. 


Hence, for each real number ay, we found a real number bo such that (ag)* — (bo)? + 
4=0. 


Proposition 2.5.4. There exists a real number x such that (3 —x)(y? +1) > 0 for all 
real numbers y. 


— 


Scratch Work. This proposition has the form (Ax)(Vy)((3 —x)(y? +1) > 0), where 
x and y are real numbers. Again, we start with the outside quantifier, which is 4x. We 
rewrite the statement to be proved as (Ax)R(x), where R(x) = “(Vy)((3—x)(y? +1) > 
0).” We prove the statement (Ax)R(x) by producing a single real number xo for which 
R(xo) holds. That is, we need to find a real number xo such that (Vy)((3 —x0)(y? + 
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1) > 0) is true, and hence we need to find a real number xo, such that if we pick an 
arbitrary real number yo, then (3 — xo)((yo)? + 1) > 0 will hold. Again we do our 
scratch work backwards. Observe that (yo)? +1 > 0 for all real numbers yo, and that 
3 —xg > 0 for all x9 < 3. We need to pick a single value of x9 that works, and we 
randomly pick xo = 2. /// 


Proof. Let xy = 2. Let yo be a real number. Observe that (yo)* + 1 > 0. Then 
(3 —x0)((vo)” + 1) = (3—2)((yo)* + 1) > 0. 


Hence, we have found a real number xo such that (3 —x9)((yo)? +1) > 0 for all real 
numbers yo. 


As discussed in Section 1.5, the order of the quantifiers in the statement of a 
theorem often matters. The statement of Proposition 2.5.3 is “For every real number 
a, there exists a real number b such that a? — b? +4 = 0,” which is (Va) (4b) (a? —b? + 
4 = 0). If we were to reverse the quantifiers, we would obtain (4b) (Va)(a* —b* +4 = 
0), which in English would read “there is a real number b such that a” —b* +4 =0 
for all real numbers a.” This last statement is not true, which we can demonstrate by 
showing that its negation is true. Using Fact 1.5.1 (2), it follows that =[(4b) (Va) (a? — 
b* +4 =0)] is equivalent to (Vb) (Sa) (a* —b* +4 4 0). To prove this latter statement, 
let bo be an arbitrary real number. We then choose ao = bo, in which case (ag)? - 
(bo)* +4 =4 4 0. Hence the negation of the statement is true, so the statement is 
false. We therefore see that the order of the quantifiers in Proposition 2.5.3 does 
matter. On the other hand, changing the order of the quantifiers in the statement of 
Proposition 2.5.4, while changing the meaning of the statement, does not make it 
become false, as the reader may verify. 


Exercises 


Exercise 2.5.1. Convert the following statements, which do not have their quanti- 
fiers explicitly written, into statements with explicit quantifiers (do not prove them, 
because the terms are meaningless). 


(1) If a5 x5 matrix has positive determinant then it is bouncy. 

(2) There is a crusty integer that is greater than 7. 

(3) For each integer k, there is an opulent integer w such that k|w. 

(4) There is a fibrous 2 x 2 matrix P such that det P > m, for each ribbed integer 
m. 

(5) Some 2 x 2 matrix M has the property that every subtle integer divides trM. 


Exercise 2.5.2. A problem that might be given in a high school mathematics class is 
“Prove that the equation e* = 5 has a unique solution.” We could rewrite the problem 
as “Prove that there exists a unique real number x such that e* = 5.” First, write 
up a solution to the problem as would be typically found in a high school class. 
Second, write up a proper solution to the problem, using the ideas discussed in this 
section. Write up the uniqueness first, without making use of the existence part of 
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the proof; avoid a backwards proof when showing existence. Do not use a calculator 
(the number x does not have to be given explicitly in decimal expansion). 


Exercise 2.5.3. Prove or give a counterexample to each of the following statements. 


(1) For each non-negative number s, there exists a non-negative number ¢ such 
that s >t. 

(2) There exists a non-negative number ¢ such that for all non-negative numbers 
s, the inequality s > ¢ holds. 

(3) For each non-negative number ¢, there exists a non-negative number s such 
that s >t. 

(4) There exists a non-negative number s such that for all non-negative numbers 
t, the inequality s >t holds. 


Exercise 2.5.4. Prove or give a counterexample to each of the following statements. 


(1) For each integer a, there exists an integer b such that a|b. 
(2) There exists an integer b such that for all integers a, the relation a|b holds. 
(3) For each integer b, there exists an integer a such that a|b. 
(4) There exists an integer a such that for all integers b, the relation a|b holds. 


Exercise 2.5.5. Prove or give a counterexample to each of the following statements. 


(1) For each real number x, there exists a real number y such that e* — y > 0. 

(2) There exists a real number y such that for all real numbers x, the inequality 
e* —y > Oholds. 

(3) For each real number y, there exists a real number x such that e* — y > 0. 

(4) There exists a real number x such that for all real numbers y, the inequality 
e* —y > O holds. 


Exercise 2.5.6. Prove or give a counterexample to the following statement. For each 
positive integer a, there exists a positive integer b such that 


1 1 


ae. 

2b7+b ~ ab? 
Exercise 2.5.7. Prove or give a counterexample to the following statement. For every 
real number y, there is a real number x such that e** + y = y?—1. 


Exercise 2.5.8. Prove or give a counterexample to the following statement. For each 
real number p, there exist real numbers q and r such that qgsin 3) =p. 


Exercise 2.5.9. Prove or give a counterexample to the following statement. For each 
integer x, and for each integer y, there exists an integer z such that z? + 2xz—y* =0. 


Exercise 2.5.10. Let P(x,y) be a statement with free variables x and y that are real 
numbers. Let a and b be real numbers. The real number u is called the least P-number 
for a and b if two conditions hold: (1) the statements P(a,u) and P(b,u) are both true; 
and (2) if w is a real number such that P(a,w) and P(b,w) are both true, then u < w. 
Suppose that c and d are real numbers, and that there is a least P-number for c and 
d. Prove that this least P-number is unique. 


80 2 Strategies for Proofs 


Exercise 2.5.11. A student is asked to show that the equation x(x— 1) = 2(x+2) has 
a solution. In the context of writing rigorous proofs, what is wrong with the following 
solution she handed in? 


“Proof: 


x(x— 1) = 2(x+2) 
resend 
x —3n-4=0 

(x—4)(x+1)=0 

1. 


k=4 or x= 


Therefore there are two solutions.” 


Exercise 2.5.12. Look through mathematics textbooks that you have previously used 
(in either high school or college), and find an example of a backwards proof. 


2.6 Writing Mathematics 


In mathematics—as in any other field—careful writing is of great importance for 
both the writer and the reader. Careful writing is clearly necessary if the writer’s 
proofs are to be understood by the reader. For the writer’s own benefit, putting a 
mathematical idea into written form forces her to pay attention to all the details of 
an argument. Often an idea that seemed to make sense in one’s head is found to be 
insufficient when put on paper. Any experienced mathematician knows that until an 
idea has been written up carefully, its correctness cannot be assumed, no matter how 
good the idea seemed at first. 

Mathematical correctness is certainly the ultimate test of the validity of a proof, 
but to allow us to judge mathematical correctness, however, a number of important 
factors in the proper writing of mathematics are needed. Some of these ideas are de- 
scribed below. See [Gil87], [Hig98], [KLR89] and [SHSD73] for further discussion 
of writing mathematics. 


1. A Written Proof Should Stand on Its Own 


The first rule of writing proofs actually applies to all forms of writing, not just math- 
ematical writing: The written text should stand on its own, without any need for clar- 
ification by the writer. Unlike writing of a more personal nature such as poetry and 
fiction, a written proof is not an expression of the writer’s feelings, but rather a docu- 
ment that should work according to objective standards. When writing a proof, state 
everything you are doing as explicitly and clearly as possible. DO NOT ASSUME 
THE READER IS A MIND READER. Err on the side of too much explanation. 
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2. Write Precisely and Carefully 


There is no room in mathematics for ambiguity. The most minute matters of phrase- 
ology in mathematics may make a difference. For example, compare the statement 
“Tf the given integer n is prime then it is not less than 2, and it is a perfect number” 
with “If the given integer n is prime, then it is not less than 2 and it is a perfect 
number.” Something as seemingly insignificant as the change of the location of a 
comma can change the meaning of a statement. MAKE SURE WHAT YOU WRITE 
IS WHAT YOU MEAN. 

As in non-mathematical writing, revision is often the key to achieving precision 
and clarity. Do not confuse the rough drafts of a proof with the final written version. 
You should revise your proofs just as you should revise all writing, which is by trying 
to read what you wrote as if someone else (whose thoughts you do not know) had 
written it. 

Write mathematics in simple, straightforward, plodding prose. Leave your imag- 
ination to the mathematical content of your writing, but keep it out of your writing 
style, so that your writing does not get in the way of communicating your mathemat- 
ical ideas. Serious mathematics is hard enough as it is, without having unnecessary 
verbiage or convoluted sentences making it even less clear. 

Particular care should be taken with the use of mathematical terminology, where 
common words are sometimes given technical meanings different from their collo- 
quial meanings (for example, the word “or’’). Precision should not be overlooked in 
the statement of what is being proved. Mathematics is often read by skipping back 
and forth, and so it is important that the statements of theorems, lemmas, proposi- 
tions and the like contain all their hypotheses, rather than having the hypotheses in 
some earlier paragraphs. Better a bit of redundancy than a confused reader. 


3. Prove What Is Appropriate 


A good proof should have just the right amount of detail—neither too little nor too 
much. The question of what needs to be included in a proof, and what can be taken as 
known by the reader, is often a matter of judgment. A good guideline is to assume that 
the reader is at the exact same level of knowledge as you are, but does not know the 
proof you are writing. It is certainly safe to assume that the reader knows elementary 
mathematics at the high school level (for example, the quadratic formula). In general, 
do not assume that the reader knows anything beyond what has been covered in your 
mathematics courses. When in doubt—prove. 


4. Be Careful with Saying Things Are “Obvious” 


It is very tempting to skip over some details in a proof by saying that they are “obvi- 
ous” or are “similar to what has already been shown.” Such statements are legitimate 
if true, but are often used as a cover for uncertainty or laziness. “Obvious” is in the 
eye of the beholder; what may seem obvious to the writer after spending hours (or 
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days) on a problem might not be so obvious to the reader. That something is obvi- 
ous should mean that another person at your level of mathematical knowledge could 
figure it out in very little time and with little effort. If it does not conform to this cri- 
terion, it is not “obvious.” As an insightful colleague once pointed out, if something 
is truly obvious, then there is probably no need to remind the reader of that fact. 

The words “trivial” and “obvious” mean different things when used by math- 
ematicians. Something is trivial if, after some amount of thought, a logically very 
simple proof is found. Something is obvious if, relative to a given amount of math- 
ematical knowledge, a proof can be thought of very quickly by anyone at the given 
level. According to an old joke, a professor tells students during a lecture that a 
certain theorem is trivial; when challenged by one student, the professor thinks and 
thinks, steps out of the room to think some more, comes back an hour later, and an- 
nounces to the class that the student was right, and that the result really is trivial. The 
joke hinges on the fact that something can be trivial without being obvious. 


5. Use Full Sentences and Correct Grammar 


The use of correct grammar (such as complete sentences and correct punctuation) 
is crucial if the reader is to follow what is written. Mathematical writing should 
be no less grammatically correct than literary prose. Mathematics is not written in 
a language different from the language we use for general speech. In this text all 
mathematics is written in English. 

A distinguishing feature of mathematical writing is the use of symbols. It is very 
important to understand that mathematical symbols are nothing but shorthand for 
expressions that could just as well be written out in words. For example, the phrase 
“x = 2?” could be written as “the variable x equals the square of the variable z.” 
Mathematical symbols are therefore subject to the rules of grammar just as words 
are. Mathematical symbols floating freely on a page are neither understandable nor 
acceptable. All symbols, even those displayed between lines, should be embedded in 
sentences and paragraphs. 

A proof is an explanation of why something is true. A well-written proof is an ex- 
planation that someone else can understand. Proper grammar helps the reader follow 
the logical flow of the proof. Connective words such as “therefore,” “hence” and “it 
follows that” help guide the logical flow, and should be used liberally. Look through 
this entire book, and you will see that we always use complete sentences and para- 
graphs, as well as correct grammar and the frequent use of connective words (except, 
of course, for some instances of typographical errors). Though it may at times seem 
cumbersome when you are writing a proof, and would like to get it done as quickly 
as possible, sticking with correct grammar and a readable style will pay off in the 
long run. 

The following two examples of poor writing, both of which contain all the math- 
ematical ideas of the proof of Theorem 2.3.5, are written without regard to proper 
grammar and style, and are modeled on homework assignments the author has re- 
ceived from students. Compare these versions of the proof with the proof as origi- 
nally given in Section 2.3. 
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The first version is genuinely awful, though for reasons the author does not un- 
derstand, some students seem to be given the impression in high school that this sort 
of writing is acceptable. 

x* = 2 and x rational 

n 


m 


n and m have no common factors 

(4) 2 a2 2 = n? = 2m’ which is even 

if n odd, n? odd (Exercise 2.2.4) contradiction 
“oneven 

n= 2k => (2k)? = 2m? = 4k? = 2m? = 2k? = m? 

m even (as before) 

..n and m both even—impossible (no common factors) 
., X is not rational. 


This second version is slightly better, being in paragraph form and with a few more 
words, but it is still far from desirable. 


; : 2 

x? = 2, x is rational. so x = a n and m have no common factors. (4) = 2; 
2 

nl 


2 = 2, n* = 2m?. If n were odd, then n2 would be odd by Exercise 2.2.4 


m2 

a contradiction because 2m” is even because it is divisible by 2. n not odd 
and hence is even. n = 2k (2k)* = 2m?, 4k? = 2m?, 2k” = m?. m is even 
as before both n and m even—impossible because any two even numbers 
have 2 as a factor, but n and m have no common factors. x is not rational. 


Mathematicians do not write papers and books this way; please do not write this way 
yourself! 


6. Use “=” Signs Properly 


One of the hallmarks of poor mathematical writing is the improper use of “=” signs. 
It is common for beginners in mathematics to write “=” when it is not appropriate, 
and to drop “=” signs when they are needed. Both these mistakes should be stu- 
diously avoided. For example, suppose that a student is asked to take the derivative 
of the function defined by f(x) = x* for all real numbers x. The first type of mistake 
occurs when someone writes something such as “f(x) = x? = 2x = f’(x).” What is 
meant is correct, but what is actually written is false (because this function does not 
equal its derivative), and it is therefore extremely confusing to anyone other than the 
writer of the statement. THE READER SHOULD NOT HAVE TO GUESS WHAT 
THE WRITER INTENDED. 

The second type of mistake occurs when someone writes “f(x) =x”, and so 2x.” 
Here again the reader has to guess what is meant by 2x. If it is meant that f(x) = 2x, 
then why not write that? 

Both of these examples of the improper use of “=” signs may seem far-fetched, 
but the author has seen these and similar mistakes quite regularly on homework as- 
signments and tests in calculus courses. A proper write-up could be either “f (x) = x* 
for all real numbers x, so f(x) = 2x for all x,” or simply “(x?)! = 2x.” 
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66 


Another common type of error involving 
tions. Suppose that a student is asked to show that 


signs involves lengthier calcula- 


(4 eG? 4 = 2k) SH ay. 


An incorrect way of writing the calculation, which the author has seen very regularly 
on homework assignments, would be 


(x? + 2x) (x? — 4) (x? —2x) = Ge? — 4x)" 

x(x + 2)(x—2)(x+2)x(x— 2) = (= 4x)? 
# (x2)? (e-2)" = OF —4x)7 
[x(x—2)(x+2)]? = (x? —4x)° 

we - 4x)? = (= Ax). 


The problem here is that this calculation as written is a backwards proof, as discussed 
in Section 2.5. The calculation starts by stating the equation that we are trying to 
prove, and deducing from it an equation that is clearly true. A correct proof should 
start from what we know to be true, and deduce that which we are trying to prove. 
In principle, if the writer of such a backwards proof were to verify that every step is 
reversible, and indicate this fact after the above write-up, then the calculation would 
be correct. However, no one ever does that, and doing so would be more complicated 
than doing the proof correctly to begin with. 

Another incorrect way of writing this same calculation, and also one that the 
author has seen regularly, is 


GP + Bx? 4a 2x) 
x(x +2)(x —2)(x + 2)x(x — 2) 
< G42) G2) 

[x(x —2)(x +2)? 

Ge —ax)-. 
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The problem here is with what is not written, namely, the signs. What is written 
is a collections of formulas, without any explicit indication of what equals what. 
The reader can often deduce what the writer of such a collection of formulas meant, 
but why risk confusion? Written mathematics should strive for clarity, and should 
therefore state exactly what the writer means. 

A helpful way to think about this second type of error is via the need for correct 
grammar. The statement “(x* + 2x)(x? — 4) (x — 2x) = (x3 — 4x)*” is a complete 
sentence, with subject “(x* + 2x) (x* — 4) (x? — 2x),” with verb “=” and with object 
“(x3 — 4x)*.” To drop the = sign is to drop the verb in this sentence. Few students 
would ever turn in a literature paper with missing verbs. And yet, unfortunately, many 
students do the equivalent in mathematics homework assignments—not because of 
any ill intention, but because, sadly, improper ways of writing lengthy calculations 
are actually taught to many students in high school. These errors should be discarded. 
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There are a number of correct ways of writing the above calculation, for example 


(Or H2x)? =A) GP = 2x) = ne Die 2) eh 2 xe 2) 
x(x +2)?(x-2)? 

= [x(x —2)(x+2)P 

= (x° — 4x)’, 


and 


(x? + 2x)(x” — 4)(x? — 2x) = x(x +2) (x —2)(x+2)x(x —2) 
= x?(x4+2)?(x—2)? = [x(x—2)(x+2)]* = (x7 — 4x)”. 


The differences between these correctly written calculations and the incorrect ones 
may seem extremely minor and overly picky, but mathematics is a difficult subject, 
and every little detail that makes something easier to follow (not to mention log- 
ically correct) is worthwhile. A lack of attention to fundamentals such as writing 
“=” signs correctly can often be a symptom of a general lack of attention to logical 
duncushness A good place to start building logical thinking is with the basics. 


7. Define All Symbols and Terms You Make Up 


Any mathematical symbols used as variables, even simple ones such as x or n, need 
to be defined before they are used. Such a definition might be as simple as “let x be a 
real number.” (If you are familiar with programming languages such as C++ or Java, 
think of having to declare all variables before they are used.) For example, it is not 
acceptable to write “x+y” without somewhere stating that x and y are real numbers 
(or whatever else they might be); the symbol + needs no definition, because it is not 
a variable, and its meaning is well-known. The same need for definition holds when 
the variable is a set, function, relation or anything else. Just because a letter such as 
n is often used to denote an integer, or the letter f is often used to denote a function, 
one cannot rely upon such conventions, because these same letters can be used to 
mean other things as well. If you want to use n to denote an integer, you must say so 
explicitly, and similarly for f denoting a function. 

The need to define variables can get a bit tricky when quantifiers are involved. 
It is important to understand the scope of any quantifier being used. Suppose that 
somewhere in a proof you have the statement “for each positive integer n, there 
is an integer p such that ....” The variables n and p are bound variables, and are 
defined only inside that statement. They cannot be used subsequently, unless they 
are redefined. If you subsequently want to use a positive integer, you cannot assume 
that the symbol 7 has already been defined as such. You would need to define it for 
the current use, by saying, as usual, something such as “let n be a positive integer.” 

Finally, it is tempting in the course of a complicated proof to make up new words 
and symbols, and to use all sorts of exotic alphabets. For the sake of readability, avoid 
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this temptation as much as possible. Do not use more symbols than absolutely nec- 
essary, and avoid exotic letters and complications (such as subscripts of subscripts) 
where feasible. Try to stick to standard notation. If you do make up some notation, 
make sure you define it explicitly. 


8. Break Up a Long Proof into Steps 


If a proof is long and difficult to follow, it is often wise to break it up into steps, or to 
isolate preliminary parts of the proof as lemmas (which are simply smaller theorems 
used to prove bigger theorems). If you use lemmas, be sure to state them precisely. 
Prior to going into the details of a long proof, it is often useful to give a sentence or 
two outlining the strategy of the proof. All lemmas and their proofs should be placed 
before they are used in the main theorem. Do not put a lemma inside the proof of the 
main theorem—doing so can be very confusing to the reader. 


9. Distinguish Formal vs. Informal Writing 


Writing mathematics involves both formal and informal writing. Formal writing is 
used for definitions, statements of theorems, proofs and examples; informal writing 
is for motivation, intuitive explanations, descriptions of the mathematical literature, 
etc. When writing up the solution to an exercise for a mathematics course, the writ- 
ing should be a formal proof. A lengthier exposition (such as a thesis or a book) 
will make use of both kinds of writing—formal writing to make sure that mathemat- 
ical rigor is maintained, and informal writing to make the text understandable and 
interesting. Do not confuse the two types of writing, or each will fail to do what it 
is supposed to do. Intuitive aids such as drawings, graphs, Venn diagrams and the 
like are extremely helpful when writing up a proof, though such aids should be in 
addition to the proof, not instead of it. 


10. Miscellaneous Writing Tips 


Most of the following items are from [KLR89] and [0Z96, pp. 109-118], which 
have many other valuable suggestions not included here for the sake of brevity. All 
the examples of poor writing given below are based on what the author has seen in 
homework assignments and tests. 


(A) Do not put a mathematical symbol directly following punctuation. As a corol- 
lary, do not start a sentence with a symbol. The only exception to this rule is when the 
punctuation is part of the mathematical notation, for example (x,y). It is important to 
avoid ambiguities that might arise from using punctuation without proper care. For 
example, does the expression “0 < x,y < 1” mean that both x and y are between 0 
and 1, or does it mean that 0 <x and y < 1? 

Bad: For all x > 3, x2 >9.y<0,soxy <0. 

Good: For all x > 3, it follows that x2 > 9. Moreover, because y < 0, then 

xy <0. 
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(B) In the final write-up of a proof, do not use logical symbols, such as A, V, 4, V 
and =, as abbreviations for words. Unless you are writing about logic, where logical 
symbols are necessary, the use of logical symbols makes proofs harder for others to 
read. Of course, you may use any symbols you want in your scratch work. 

Bad: V distinct real numbers x A y, if x < y= d rational g such thatx<q<y. 

Good: For all distinct real numbers x and y, if x < y then there exists a 

rational number g such that x <q <y. 
(C) Use equal signs only in equations (and only then when the two sides are equal!). 
Do not use equal signs when you mean “implies,” “the next step is” or “denotes.” Do 
not use equal signs instead of punctuation, or as a substitute for something properly 
expressed in words. 

Bad: n= odd = 2k+1. 

Good: Let n be an odd number. Then n = 2k + 1 for some integer k. 

Bad: For the next step, leti=i+ 1. 

Good: For the next step, replace i with i+ 1. 

Bad: Let P = the # of people in the room. 

Good: Let P denote the number of people in the room. 
(D) Use consistent notation throughout a proof. For example, if you start a proof 
using uppercase letters for matrices and lowercase letters for numbers, stick with 
that notation for the duration of the proof. Do not use the same notation to mean 
two different things, except when it is unavoidable due to standard mathematical 
usage—for example, the multiple uses of the notation “(a,b).” 
(E) Display long formulas, as well as short ones that are important, on their own 
lines. Recall, however, that such displayed formulas are still parts of sentences, and 
require normal punctuation. In particular, if a sentence ends with a displayed formula, 
do not forget the period at the end of the formula. Also, do not put an unnecessary 
colon in front of a displayed formula that does not require it. 

Bad: From our previous calculations, we see that: 


» —rcos@ = \/y2 +3 


Good: From our previous calculations, we see that 


x —rcos0@ = \/y?+3 


(F) Colons are very rarely needed. They are usually either unnecessary, as in the 
bad example in Item (E), or meant as substitutes for words in situations where words 
would be much more clear. In mathematical writing, colons should normally be used 
only in headings or at the starts of lists, and in certain mathematical symbols. Do 
not use a colon in mathematical writing in a place where you would not use one in 
non-mathematical writing. 

Bad: x? + 10x +3 = 0 has two real solutions: 10? —4-1-3 >0. 

Good: The equation x? + 10x +3 = 0 has two real solutions because 107 — 

4-1-3>0. 
(G) Capitalize names such as “Theorem 2.3” and “Lemma 17.” No capitalization is 
needed in phrases such as “by the previous theorem.” 
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Exercises 


Exercise 2.6.1. State what is wrong with each of the following write-ups; some have 
more than one error. 


(1) We make use of the fact about the real numbers that if x > 0, x? > 0. 
(2) To solve x* + 6x = 16: 


x +6x = 16 
x +6x—-16=0 
(x—2)(x+8) =0 


and x =2,x = —8. 

(3) In order to solve x? + 6x = 16, then x” + 6x — 16 = 0, (x —2)(x +8) =0, and 
therefore x = 2, x = —8. 

(4) We want to solve the equation x* —2x =x+10. then x? —3x— 10, so (x — 
5)(x+2), so 5 and —2. 

(5) We want to multiply the two polynomials (7 + 2y) and (y? + 5y—6), which 
we do by computing 


(7+ 2y)(y? +5y—6) 
Ty? + 35y — 42+ 2y? + 10y* — 12y 
Oy ly + By 


the answer is 2y? + 17y? + 23y—42. 

(6) Areal number x is gloppy if there is some integer n such that x? —n is sloppy. 
Suppose that x is gloppy. Because n is an integer, then its square is an integer, 
.... (The terms here are meaningless.) 

(7) Let x be a real number. Then x? > 0 for all real numbers x, ... . 

(8) It is known that \/a <a for alla > 1. Hence \/a+3 <a+3. Hence (./a+ 
3)? < (a+3/. 


Part II 
FUNDAMENTALS 


We turn now from the “how” of mathematics, which is the methodol- 
ogy of proofs, to the “what,” which is the content of mathematics. In 
such a vastly broad subject as mathematics, it might be hard to imag- 
ine that there is anything common to all aspects of it, but in fact most 
of modern pure mathematics is based upon a few shared fundamental 
ideas such as sets, functions and relations. We now discuss the basic 
features of these ideas. The tone and style of writing in the text now 
changes correspondingly to the change in our subject matter. We will 
have less informal discussion, and will write in the more straightforward 
definition/theorem/proof style used in most advanced mathematics texts 
(though we will not drop all intuitive explanation). This change in style 
occurs for several reasons: the need to cover a fairly large amount of ma- 
terial in a reasonable amount of space; the intention of familiarizing the 
reader with the standard way in which mathematics is written; the fact 
that with practice (which comes from doing exercises), the reader will 
not need to be led through the proofs so slowly any more. 


Sets 


No one shall expel us from the paradise that Cantor created for us. 
— David Hilbert (1862-1943) 


3.1 Introduction 


A completely rigorous treatment of mathematics, it might seem, would require us to 
define every term and prove every statement we encounter. However, unless we want 
to engage in circular reasoning, or have an argument that goes backwards infinitely 
far, we have to choose some place as a logical starting point, and then do every- 
thing else on the basis of this starting point. This approach is precisely what Euclid 
attempted to do for geometry in “The Elements,’ where certain axioms were for- 
mulated, and everything else was deduced from them. (We say “attempted” because 
there are some logical gaps in “The Elements,” starting with the proof of the very first 
proposition in Book I. Fortunately, these gaps can be fixed by using a more complete 
collection of axioms, such as the one proposed by Hilbert in 1899, which made Eu- 
clidean geometry into the rigorous system that most people believed it was all along. 
The discovery of non-Euclidean geometry is a separate matter. See [WW98] for de- 
tails on both these issues. This critique of Euclid, it should be stressed, is in no way 
intended to deny the overwhelming importance of his work.) 

What Euclid did not seem to realize was that what holds for theorems also holds 
for definitions. Consider, for example, Euclid’s definition of a straight line, found 
at the start of “The Elements”: “A line is breadthless length. A straight line is a line 
which lies evenly with itself’ By modern standards this definition is rather worthless. 
What is a “length,” breadthless or not, and what is “breadth”? What does it mean 
for something to “lie evenly with itself’? This last phrase does correspond to our 
intuitive understanding of straight lines, but if we want to give a rigorous definition 
such vague language will definitely not do. 

The problem with Euclid’s definitions is not just their details, but rather the at- 
tempt to define every term used. Just as we cannot prove every theorem, and have 
to start with some unproved results, we cannot define every object, and need to 
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start with some undefined terms. Even analytic geometry (invented long after Eu- 
clid), which appears to do geometry without the use of axioms about geometry, ul- 
timately relies upon some axioms and undefined terms regarding the real numbers. 
Axioms and undefined terms are unavoidable for rigorous mathematics. The modern 
approach in mathematics accepts the existence of undefined terms, as long as they 
are used properly. Ultimately, undefined objects do not bother us because such ob- 
jects do not so much exist in themselves as they are determined by the axiomatic 
properties hypothesized for them, and it is these properties that we use in proofs. 

A common misconception is that mathematicians spend their time writing down 
arbitrary collections of axioms, and then playing with them to see what they can de- 
duce from each collection. Mathematics (at least of the pure variety) is then thought 
to be a kind of formal, abstract game with no purpose other than the fun of play- 
ing it (others might phrase it less kindly). In fact, nothing could be further from the 
truth. Not only would arbitrarily chosen axioms quite likely be contradictory, but, no 
less important, they would not describe anything of interest. The various axiomatic 
schemes used in modern mathematics, in such areas as group theory, linear alge- 
bra and topology, were arrived at only after long periods of study, involving many 
concrete examples and much trial and error. You will see these various collections 
of axioms in subsequent mathematics courses. The point of axiomatic systems is to 
rigorize various parts of mathematics that are otherwise of interest, for either his- 
torical or applied reasons. Of course, mathematicians do find real pleasure in doing 
mathematics—that is why most of us do it—but it is the pleasure of thinking about 
subtle and fascinating ideas, not the pleasure of playing games. 

In this text we will not focus on developing mathematics in an axiomatic fashion, 
though a few systems of axioms will be given in Chapter 7. In the present chapter 
we will discuss the common basis for all systems of axioms used in contemporary 
mathematics, which is set theory. Though of surprisingly recent vintage, having been 
developed by Georg Cantor in the late nineteenth century, set theory has become 
widely accepted among mathematicians as the starting place for rigorous mathemat- 
ics. We will take an intuitive approach to set theory (often referred to as “naive set 
theory’’), but then build on it rigorously. Set theory itself can be done axiomatically, 
though doing so is non-trivial, and there are a number of different approaches that 
are used. In Section 3.5 we will informally discuss the most common axiomatic ap- 
proach to set theory, the Zermelo—Fraenkel Axioms, but other than in that section we 
maintain the standard approach of taking an intuitive approach to the foundations of 
set theory, but then proving everything else rigorously on the basis of sets. 

For additional information about naive set theory see the classic reference [Hal60]; 
see [EFT94, Section 7.4] for a discussion of the role of set theory as a basis for math- 
ematics; and see [Sup60], [Ham82], [Dev93] and [Vau95] for more about axiomatic 
set theory. 


3.2 Sets—Basic Definitions 93 


3.2 Sets—Basic Definitions 


The basic undefined term we will use is that of a set, which we take to be any col- 
lection of objects, not necessarily mathematical ones. For example, we can take the 
set of all people born in San Francisco in 1963. The objects contained in the set are 
called the elements or members of the set. If A is a set and a is an element of A, we 
write 

acA. 


If ais not in the set A, we write 
adA. 


Given any set A and any object a, we assume that precisely one of ac A oragA 
holds. 

The simplest way of presenting a set is to list its elements, which by standard 
convention are written between curly brackets. For example, the set consisting of the 
letters a, b, c and d is written 

{a,b,c,d}. 


The order in which the elements of a set are listed is irrelevant. Hence the set {1,2,3} 
is the same as the set {2,3, 1}. Each element of a set is listed once and only once, so 
that we would never write {1,2,2,3}. 
There are four sets of numbers that we will use regularly: the set of natural 
numbers 
1152.33.04 


denoted N; the set of integers 
{...,—2,-1,0,1,2,...}, 


denoted Z; the set of rational numbers, denoted Q, which is the set of fractions; 
the set of real numbers, denoted R, which is the set of all the numbers that are 
informally thought of as forming the number line. 

An extremely valuable set we will regularly encounter is the empty set (also 
called the null set) which is the set that does not have any elements in it. That is, the 
empty set is the set { }. This set is denoted 0. It may seem strange to consider a set 
that doesn’t have anything in it, but the role of the empty set in set theory is somewhat 
analogous to the role of zero in arithmetic. (The number zero was a historically late 
arrival, and presumably might have seemed strange to some at first, just as the empty 
set might seem strange at first today; zero does not seem strange to us today because 
we start getting used to it at a young age). 

It is sometimes not convenient, or not possible, to list explicitly all the elements 
of a set. In such situations it is sometimes possible to present a set by describing it as 
the set of all elements satisfying some criteria. For example, consider the set of all 
integers that are perfect squares. We could write this set as 


S={n€Z|nisa perfect square}, 
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which is read “the set of all n in Z such that n is a perfect square.” Some books use 
a colon “:” instead of a vertical line in the above set notation, though the meaning is 
exactly the same, namely, “such that.” If we want to write the above set even more 


carefully we could write 
S={n€Z|n=K for some k € Z}. 
If we wanted to emphasize the existential quantifier, we could write 
S={n€Z| there exists k € Z such that n = k*}. (3.2.1) 


The letters n and k used in this definition are “dummy variables.” We would obtain 
the exact same set if we wrote 


S = {x € Z| there exists r € Z such that x = r*}. (3.2.2) 


The above method of defining sets is quite straightforward, but there is one point 
about this method that needs to be stressed. Because the letters x and r in Equa- 
tion 3.2.2 are dummy variables, we cannot use them outside the “{ | }” notation 
without redefinition. Hence, if we want to refer to some element of the set defined 
in Equation 3.2.1 and Equation 3.2.2, for example pointing out that such elements 
must be non-negative, it would not be correct to say simply “observe that x > 0.” By 
contrast, it would be correct to say “observe that x > O for all x € S”’ However, this 
latter formulation has the defect that if we want to continue to discuss elements in S, 
we would have to define x once again, because the x in “x > 0 for all x € S” is bound 
by the quantifier. A better approach would be to write “let x € S; then x > 0.” Now 
that x has been defined as an element of S, not bound by a quantifier, we can use it 
as often as we wish without redefinition. 

An example of the above method of defining sets is seen in the following widely 
used definition. 


Definition 3.2.1. An open bounded interval is a set of the form 
(a,b) ={xER|a<x<}}, 
where a,b € Randa < b. A closed bounded interval is a set of the form 
[a,b] ={xERla<x<b}, 
where a,b € R and a < b. A half-open interval is a set of the form 


la,b)={xER|a<x<b} or (abl={xeER|a<x<}}, 


where a,b € R and a < b. An open unbounded interval is a set of the form 
(a,~) ={xER|a<x} or (-0,b)={xER|x<b} or (—c0,0) =R, 
where a,b € R. A closed unbounded interval is a set of the form 


a,o)={xER|a<x} or (-~ db) ={xER|x< }}, 


where a,b ER. A 
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Observe that there are no intervals that are “closed” at oo or —co, for example 
there is no interval of the form [a,°], because “‘co” is not a real number, and therefore 
it cannot be included in an interval contained in the real numbers. The symbol “co” 
is simply a shorthand way of saying that an interval “goes on forever.” 

If a,b € Randa < b, then the set (a,b) is “contained in” the set [a,b]. This notion 


of a set being contained in another is formalized as follows. 


Definition 3.2.2. Let A and B be sets. The set A is a subset of the set B, denoted 
A CB, if x € A implies x € B. If A is not a subset of B, we write A Z B. A 


Observe that if A and B are sets and if A ¢ B, then it is still possible that some of 
the elements of A are in B, just not all. See Figure 3.2.1 for a schematic drawing of 
ACBandA ZB. 


OF. ay 


ACB ALB 


Fig. 3.2.1. 


Example 3.2.3. 


(1) Let A = {1,2,3,4} and B= {1,3}. Then BCA andA ZB. 

(2) Let M be the set of all men, and let T be the set of all proctologists. Then 
T Z M because not all proctologists are men, and M & T because not all men are 
proctologists. ©) 


There is a standard strategy for proving a statement of the form “A C B,’ which 
is to take an arbitrary element a € A, and then to use the definitions of A and B to 
deduce that a € B. Such a proof typically has the following form. 


Proof. Leta cA. 


(argumentation) 


Then a € B. Hence A CB. 


We will see a number of proofs using this strategy throughout this chapter. To 
prove a statement of the form “A Z B,” by contrast, we simply need to find some a € A 
such that a ¢ B, a fact that seems intuitively clear, and that can be seen formally as 
follows. The statement A C B can be written as (Vx) (|x € A] — [x € B]). Then A Z B 
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can be written as =(Vx)([x € A] > [x € B]), which is equivalent to (x) ([x € A] A [x ¢ 
B]) by Fact 1.5.1 (1) and Fact 1.3.2 (14). 

It is important to distinguish between the notion of an object being an element 
of a set, and the notion of a set being a subset of another set. For example, let A = 
{a,b,c}. Then a € A and {a} CA are true, whereas the statements “a C A” and 
“fa} € A” are false. Also, observe that a set can be an element of another set. Let 
B = {{a},b,c}. Observe that B is not the same as the set A. Then {a} € B and 
{{a}} C B are true, but “a € B” and “{a} C B” are false. 

The following lemma states some basic properties of subsets. The proof of this 
lemma, our first proof about sets, makes repeated use of the strategy mentioned above 
for showing that one set is a subset of another set. 


Lemma 3.2.4. Let A, B and C be sets. 


1 ACA. 
2. 0CA. 
3. IfA CBand BCC, thenA CC. 


Proof. 


(1). To show that A C A, we start by choosing an arbitrary element a € A, where 
we think of this “A” as the one on the left-hand side of the expression “A C A.” It 
then follows that a € A, where we now think of this “A” as the one on the right-hand 
side of the expression “A C A.” Hence A C A, using the definition of subsets. 


(2). We give two proofs, because both are instructive. First, we have a direct 
proof. To show that @ C A, we need to show that if a € 0, then a € A. Because a € 0 
is always false, then the logical implication “if a € @, then a € A” is always true, 
using the precise definition of the conditional given in Section 1.2. 

Next, we have a proof by contradiction. Suppose that 0 Z A. Then there exists 
some x € @ such that x ¢ A. This statement cannot be true, however, because there 
is no x such that x € @. We have therefore reached a contradiction, and hence the 
desired result is true. 

This proof by contradiction might not appear to fit the standard outline for such 
proofs as described in Section 2.3, because it does not appear as if we are viewing 
the statement being proved as having the form P — Q. In fact, there are two ways of 
viewing the statement being proved as having this form. For the direct proof given 
above, we viewed the statement being proved as (VA)([a € 0] — |a € A]). We then 
chose an arbitrary set A, and proved the statement [a € 0] — [a € A]. For the proof 
by contradiction, we viewed the statement being proved as “if A is a set, then @ C A,” 
and then indeed used our standard method of doing proof by contradiction. 


(3). This proof, having no logical tricks, is extremely typical. Let a € A. Because 
A CB, it follows that a € B. Because B C C, it follows that a € C. Therefore we see 
that a € A implies a € C, and hence A CC. 


When are two sets equal to one another? Intuitively, two sets are equal when they 
have the same elements. We formally define this concept as follows. 
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Definition 3.2.5. Let A and B be sets. The set A equals the set B, denoted A = B, if 
A CB and B CA. The set A is a proper subset of the set B, denoted A G B,ifA CB 
and A #B. A 


There is a bit of variation in the mathematical literature for the notation used for 
proper subsets. Some texts use A C B to mean A is a proper subset of B, whereas 
others use the notation A C B to mean what we write as A C B. 


Example 3.2.6. 


(1) Let A and B be the sets in Example 3.2.3 (1). Then B is a proper subset of A. 
(2) Let X = {a,b,c}, and let Y = {c,b,a}. Then clearly X CY and Y CX, so 
xX=Y, 
(3) Let 
P={xER|x—5x+6 <0}, 


and 
Q={xER|2<x< 3}. 


We will show that P = Q, putting in more detail than is really necessary for a prob- 
lem at this level of difficulty, but we want to make the proof strategy as explicit as 
possible. 

First, we show that P C Q. Let y € P. Then y? —5y+6 <0. Hence (y—2)(y—3) < 
0. It follows that either y— 2 < 0 and y—3 > 0, or that y—2 > 0 and y—3 <0. If 
y—2<Oand y—3 >0, then y <2 and 3 < y; because there is no number that satisfies 
both these inequalities, then this case cannot occur. If y—2 > 0 and y—3 <0, then 
2 <yand y < 3. Hence 2 < y < 3. It follows that y € Q. Therefore P C Q. 

Next, we show that Q C P. Let z€ Q. Then 2 < z <3. Hence 2 < zand z < 3, and 
so z—2 > 0 and z—3 <0. Therefore (z—2)(z—3) <0, and therefore z*—5z+6 <0. 
Hence z € P. Therefore Q C P. 

By combining the previous two paragraphs we deduce that P = Q. o 


Example 3.2.6 (3) may seem to be much ado about nothing, because the result 
proved is trivial, but the strategy used is not. Virtually every time we show that two 
sets A and B are equal, we go back to the definition of equality of sets. The strategy 
for proving a statement of the form “A = B” for sets A and B is therefore to prove 
that A C B and that B CA. Such a proof typically has the following form. 


Proof. Leta cA. 
(argumentation) 


Then a © B. Therefore A C B. 
Next, Let b € B. 


(argumentation) 


98 3 Sets 


Then b € A. Hence B CA. 
We conclude that A = B. 


We will see a number of examples of this strategy, starting with the proof of 
Theorem 3.3.3 (4) in the next section. 

The following lemma gives the most basic properties of equality of sets. The 
three parts of the lemma correspond to three properties of relations we will discuss 
in Sections 5.1 and 5.3. 


Lemma 3.2.7. Let A, B and C be sets. 


1 A=A. 
2. If[A=B thenB=A. 
3. IfA=BandB=C, thenA=C. 


Proof. All three parts of this lemma follow straightforwardly from the definition of 
equality of sets together with Lemma 3.2.4. Details are left to the reader. 


In some situations we will find it useful to look at not just one subset of a given 
set, but at all subsets of the set. In particular, we can form a new set, the elements of 
which are the subsets of the given set. 


Definition 3.2.8. Let A be a set. The power set of A, denoted (A), is the set defined 
by 

P(A) = {X | X CA}. A 
Example 3.2.9. 


(1) Because 0 C @, then (0) = {0}. In particular, we see that P(0) 4 0. 

(2) Let A = {a,b,c}. Then the subsets of A are 0, {a}, {b}, {c}, {a,b}, {a,c}, 
{b,c} and {a,b,c}. The last of these subsets is not proper, but we need all subsets, 
not only the proper ones. Therefore 


P(A) = {0, {a}, 1b}, {ch {a,b}, {a,c}, {b,c}, {a,b,c} }. 
It can be seen intuitively that if A is a finite set with n elements, then (A) is a finite 


set with 2” elements; by Part (1) of this exercise we see that this formula holds even 
when n = 0. This formula is proved in Theorem 7.7.10 (1). ?) 


Sets can be either finite or infinite in size. The set A in Example 3.2.9 (2) is finite, 
whereas sets such as N or R are infinite. For now we will use the terms “finite” and 
“infinite” intuitively. These concepts will be defined rigorously in Section 6.5. If a 
set A is finite, then we use the notation |A| to denote the number of elements in A 
(often referred to as the “cardinality” of A). Some basic facts about the cardinalities 
of finite sets can be found in Sections 6.5, 7.6 and 7.7. 


Exercises 


Exercise 3.2.1. How many elements does the set A = {a,b, {a,b}} have? 


Exercise 3.2.2. Which of the following are true and which are false? 
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(1) 3 € (3,5). (6) [1,2] C {0,1,2,3}. 

(2) 10¢ (—c0, 27]. (7) {—1,0, 1} Cc [=t.1). 
(3) 7€ {2,3,4,..., 11}. (8) [5,7] € (4,~). 

(4) 1 (2,0). (9) {2,4,8,16,...} C [2,c). 


6) +13 €(..4-3.91), 


Exercise 3.2.3. What are the following sets commonly called? 


(1) {n€ Z|n= 2m for some me Z}. 

(2) {kN | there exist p,q € N such that k = pg, and that |< p<kandl<q< 
k}. 

(3) {x € R| there exist a,b € Z such that b 4 0 and x = $}. 


Exercise 3.2.4. Let P be the set of all people, let M be the set of all men and let F 
be the set of all women. Describe each of the following sets with words. 


(1) {x € P|x€M and x has a child}. 

(2) {x € P | there exist y,z € P such that y is a child of x, and zis a child of y}. 
(3) {x € P| there exist m € F such that x is married to m}. 

(4) {x € P | there exist g € P such that x and q have the same mother}. 

(5) {x € P | there exist h € P such that h is older than x}. 

(6) {x € P| there exist n € M such that x is the child of n, and x is older than n}. 


Exercise 3.2.5. Describe the following sets in the style of Equation 3.2.1. 


(1) The set of all positive real numbers. 

(2) The set of all odd integers. 

(3) The set of all rational numbers that have a factor of 5 in their denominators. 

(4) The set {—64, —27, —8, —1,0,1,8,27, 64}. 

(5) The set {1,5,9,13,17,21,...}. 
Exercise 3.2.6. We assume for this exercise that functions are intuitively familiar 
to the reader (a formal definition will be given in Chapter 4). Let F denote the set 
of all functions from the real numbers to the real numbers; let D denote the set of 
all differentiable functions from the real numbers to the real numbers; let P denote 
the set of all polynomial functions from the real numbers to the real numbers; let C 
denote the set of all continuous functions from the real numbers to the real numbers; 
let E denote the set of all exponential functions from the real numbers to the real 
numbers. Which of these sets are subsets of which? 


Exercise 3.2.7. Among the following sets, which is a subset of which? 


M is the set of all men; 

W is the set of all women; 
P is the set of all parents; 
O 1s the set of all mothers; 
F is the set of all fathers; 


U is the set of all uncles; 
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A 1s the set of all aunts; 


C is the set of all people who are children of other people. 
Exercise 3.2.8. Among the following sets, which is a subset of which? 


C ={n€Z| there exists k € Z such that n = k*}, 

E ={n€Z| there exists k € Z such that n = 2k}; 
P={n€Z|nisa prime number}; 

N ={n€Z | there exists k € Z such that n = k*}; 
S={n € Z| there exists k € Z such that n = 6k}; 
D=({n€ Z| there exists k € Z such that n = k—5}; 
B= {n€ Z| nis non-negative}. 


Exercise 3.2.9. Find sets A and B such that A € B and A C B. (It might appear as if 
we are contradicting what was discussed after Example 3.2.3; the solution, however, 
is the “exception that proves the rule.”) 


Exercise 3.2.10. Let A, B and C be sets. Suppose that A C B and BCC andC CA. 
Prove that A = B=C. 


Exercise 3.2.11. [Used in Theorem 3.5.6.] Let A and B be sets. Prove that it is not 
possible that A g B and B CA are both true. 


Exercise 3.2.12. Let A and B be any two sets. Is it true that one of AC BorA=B 
or A > B must be true? Give a proof or a counterexample. 


Exercise 3.2.13. Let A = {x,y,z,w}. List all the elements in P(A)? 
Exercise 3.2.14. Let A and B be sets. Suppose that A C B. Prove that P(A) C P(B). 


Exercise 3.2.15. List all elements of each of the following sets. 


(1) P(P(0)). (2) P(P({O})). 


Exercise 3.2.16. Which of the following are true and which are false? 


(1) {0} C G for all sets G. (6) @ € 2(G) for all sets G. 
(2) 0 CG for all sets G. (7) {{O}} C PO). 

(3) @ C P(G) for all sets G. (8) {0} C {{O, {0}, {{O}}}}. 
(4) {0} C 2(G) for all sets G. (9) P({O}) = {0, {O}}. 


(5) 0 € G for all sets G. 
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3.3 Set Operations 


There are a number of ways to make new sets out of old, somewhat analogous to 
combining numbers via addition and multiplication. A closer analogy is the way in 
which we combined statements in Section 1.2. The two most basic set operations, 
which we now describe, correspond to the logical operations “or” and “and.” 


Definition 3.3.1. Let A and B be sets. The union of A and B, denoted A UB, is the 
set defined by 
AUB={x|xeAorxe Bh. 


The intersection of A and B, denoted AM B, is the set defined by 
ANB={x|xeAandxe B}. A 


If A and B are sets, the set A UB is the set containing everything that is either 
in A or B or both (recall our discussion of the mathematical use of the word “or” in 
Section 1.2). The set AMB is the set containing everything that is in both A and B. 


Example 3.3.2. Let A = {x,y,z, p} and B = {x,q}. Then 
AUB ={x,y,z,p,q} and ANB = {x}. © 


To help visualize unions and intersections of sets (as well as other constructions 
we will define), we can make use of what are known as Venn diagrams. A Venn 
diagram for a set is simply a region of the plane that schematically represents the 
set. See Figure 3.3.1 (i) for a Venn diagram representing two sets A and B, placed 
in the most general possible relation to each other. In Figure 3.3.1 (11) the region 
representing A UB is shaded, and in Figure 3.3.1 (1i1) the region representing AM B is 
shaded. 


AUB ANB 
C707) GD Cb 
(i) (ii) (ii) 
Fig. 3.3.1. 


Venn diagrams can be useful for convincing ourselves of the intuitive truth of var- 
ious propositions concerning sets. For instance, we will prove in Theorem 3.3.3 (5) 
that AN (BUC) = (ANB) U(ANC) for any three sets A, B and C. To gain an intuitive 
feeling for this result, we can find the region in a Venn diagram for each of the two 
sides of the equation, and then observe that the two regions are the same, namely, 
the shaded region in Figure 3.3.2. Although Venn diagrams seem much easier to use 


102 3 Sets 


than proofs, a Venn diagram is no more than a visual aid, and is never a substitute 
for a real proof. Moreover, it is tricky to use Venn diagrams for more than three sets 
at a time, and this severely limits their use. 


Cc 


Fig. 3.3.2. 


Do the familiar properties of addition and multiplication of numbers (such as 
commutativity and associativity) also hold for union and intersection of sets? The 
following theorem shows that such properties do hold, although they are not exactly 
the same as for addition and multiplication. 


Theorem 3.3.3. Let A, B and C be sets. 


1. ANB CA and ANB CB. If X is a set such that X CA and X CB, then 
X CANB. 

2. ACAUB and BCAUB. If Y is a set such that ACY and B CY, then 
AUBCY. 

3. AUB=BUAandANB=BNA_— (Commutative Laws). 

4. (AUB)UC =AU(BUC) and (ANB)NC=AN(BNC) (Associative Laws). 

5. AN(BUC) = (ANB)U(ANC) and AU(BNC) = (AUB)N (AUC) (Dis- 

tributive Laws). 

AU@=AandAN®O=90 _ (Identity Laws). 

AUA=AandANA=A _ (Idempotent Laws). 

AU(ANB) =A and AN(AUB)=A __ (Absorption Laws). 

. IfA CB, thn AUC C BUC and ANC CBNC. 


Oo NX 


Proof. We will prove Parts (4) and (5), leaving the rest to the reader in Exercise 3.3.6. 


(4). We will show that (AUB) UC =AU (BUC); the other equation can be 
proved similarly, and we omit the details. As usual, the equality of the two sets under 
consideration is demonstrated by showing that each is a subset of the other. 

Let x € (AUB) UC. Then x € AUB orx €C. First, suppose that x € AUB. Then 
x €Aorx € B.Ifx €A then x € AU(BUC) by Part (2) of this theorem, and if x € B 
then x € BUC, and hence x € AU(BUC). Second, suppose that x € C. It follows 
from Part (2) of this theorem that x € BUC, and hence x € AU (BUC). Putting the 
two cases together, we deduce that (AUB)UC CAU (BUC). 

The proof that AU (BUC) C (AUB) UC is similar to the above proof, simply 
changing the roles of A and C, and we omit the details. 
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We deduce that (AUB) UC =AU(BUC). 


(5). We prove AN (BUC) = (ANB)U (ANC); the other equation can be proved 
similarly. Let x € AN (BUC). Then x € A and x € BUC. Hence x € B or x €C. If 
x € B we deduce that x € ANB, and if x € C we deduce that x € ANC. In either 
case, we use Part (2) of this theorem to see that x € (AM B)U(ANC). Therefore 
AN (BUC) C (ANB)U(ANC). 

Now let y € (ANB) U(ANC). Then y € ANB or y CANC. First, suppose that 
y € ANB. Then y €A and y € B. Hence y € BUC by Part (2) of the theorem, and 
therefore y € AM (BUC). Second, suppose that y € ANC. A similar argument to the 
previous case shows that y € AN (BUC); we omit the details. Combining the two 
cases we deduce that (ANB)U(ANC) CAN(BUC). 

We conclude that AM (BUC) = (ANB)U(ANC). 


It is seen in Part (5) of Theorem 3.3.3 that both union and intersection distribute 
over each other, which is quite different from addition and multiplication of numbers, 
where multiplication distributes over addition, but not vice versa. 

The following definition formalizes the notion of two sets having no elements in 
common. 


Definition 3.3.4. Let A and B be sets. The sets A and B are disjoint ifANB=0. A 


Example 3.3.5. Let E be the set of even integers, let O be the set of odd integers and 
let P be the set of prime numbers. Then EF and O are disjoint, whereas E and P are 
not disjoint (because EP = {2}). % 


Another useful set operation is given in the following definition. 


Definition 3.3.6. Let A and B be sets. The difference (also called the set difference) 
of A and B, denoted A — B, is the set defined by 


A-—B={x|xeAand x ¢ B}. A 


Some books use the notation A \ B instead of A — B. The set A — B is the set 
containing everything that is in A but is not in B. The set A — B is defined for any two 
sets A and B; it is not necessary to have B C A. See Figure 3.3.3 for a Venn diagram 


of the A — B. 


A-B 


Fig. 3.3.3. 
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Example 3.3.7. Let A and B be the sets in Example 3.3.2. Then 
A—B={y,z, p}. » 


The following theorem gives some standard properties of set difference. 


Theorem 3.3.8. Let A, B and C be sets. 


1. A-BCA. 

2. (A—B)NB=8. 

3. A—B=Oifand only ifA CB. 

4. B—(B—A) =A ifand only ifA CB. 

5. IfA CB, thenA—C=AN(B-C). 

6. IfA CB, thnC—ADC-B. 

7. C— (AUB) = (C—A)N(C-—B) andC — (ANB) =(C—A)U(C—B) (De 


Morgan’s Laws). 


Proof. We will prove Part (7), leaving the rest to the reader in Exercise 3.3.7. 


(7). We will show that C— (AUB) = (C—A)M(C —B); the other equation can 
be proved similarly, and we omit the details. Let x € C— (AUB). Then x € C and 
x ¢ AUB. It follows that x ¢ A and x ¢ B, because x € A or x € B would imply that 
x © AUB. Because x € C and x € A, then x € C—A. Because x € C and x ¢ B, then 
x € C—B. Hence x € (C—A)M(C—B). Therefore C— (AUB) C (C—A)N(C—B). 

Now let y € (C—A)M(C—B). Hence y€ C—A and y € C—B. Because y € C—A, 
it follows that y € C and y ¢ A. Because y € C —B, it follows that y € C and y ¢ B. 
Because y ¢ A and y ¢ B, it follows that y ¢ AUB. Therefore y € C— (AUB). Hence 
(C—A)N(C—B) CC— (AUB). 

We conclude that C— (AUB) = (C—A)N(C—B). 


There is one more fundamental way of forming new sets out of old that we will 
be using regularly. Think of how the plane is coordinatized by ordered pairs of real 
numbers. In the following definition we make use of the notion of an ordered pair of 
elements, denoted (a,b), where a and b are elements of some given sets. Unlike a set 
{a,b}, where the order of the elements does not matter (so that {a,b} = {b,a}), in 
an ordered pair the order of the elements does matter. We take this idea intuitively, 
though it can be defined rigorously in terms of sets (see [Mac96]). The idea is to 
represent the ordered pair (a,b) as the set {{a}, {a,b}}. Though it may seem obvi- 
ous, it is important to state that the ordered pair (a,b) equals the ordered pair (c,d) 
if and only if a =c and b =d. (The notation “(a,b)” used to denote an ordered pair 
is, unfortunately, identical to the notation “(a,b)” used to denote an open bounded 
interval of real numbers, as defined in Section 3.2. Both uses of this notation are very 
widespread, so we are stuck with them. In practice the meaning of “(a,b)” is usually 
clear from the context.) 


Definition 3.3.9. Let A and B be sets. The product (also called the Cartesian prod- 
uct) of A and B, denoted A x B, is the set 
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Ax B= {(a,b) |a€A and be B}, 


where (a,b) denotes an ordered pair. A 
Example 3.3.10. 
(1) Let A = {a,b,c} and B = {1,2}. Then 


Ax B= {(a,1),(a,2),(b, 1), (b,2), (ce, 1), (c,2)}. 


(2) Roll a pair of dice. The possible outcomes are 


(1,1) (1,2) (1,3) 1,4) (1,5) (1,6) 
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) 
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) 
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) 
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) 
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) 


This table is the product of the set {1,...,6} with itself. 
(3) It can be seen intuitively that if A and B are finite sets, then A x B is finite and 
|A x B| = |A|- |B]. This fact is proved in Theorem 7.6.3. ?) 


We can form the product of more than two sets, and although there is no essential 
problem doing so, there is one slight technicality worth mentioning. Suppose that we 
want to form the product of the three sets A, B and C. Keeping these sets in the given 
order, we could form the triple product in two ways, yielding the sets (A x B) x C and 
Ax (Bx C). Strictly speaking, these two triple products are not the same, because the 
first has elements of the form ((a,b),c), whereas the second has elements of the form 
(a, (b,c)). There is, however, no practical difference between the two triple products, 
and we will therefore gloss over this technicality, simply referring to A x B x C, and 
writing a typical element as (a,b,c). The precise relation between (A x B) x C and 
A x (B x C), which is given in Exercise 4.4.6, makes use of the concepts developed 
in Section 4.4. 


Example 3.3.11. We can think of R*, which is defined in terms of ordered pairs of 
real numbers, as R? = R x R. Similarly, we think of R” as 
R"=Rx::-xR. © 
—_—_--—— 
n times 
The following theorem gives some standard properties of products of sets. 
Theorem 3.3.12. Let A, B, C and D be sets. 


1. IfA CBandC CD, thnAxCCBxD. 

2. Ax (BUC) = (Ax B)U(A XC) and (BUC) xA=(BXA)U(CXA)  (Dis- 
tributive Laws). 

3. Ax (BNC) = (Ax B)N(A XC) and (BNC) xA=(BXA)N(CXxA)  (Dis- 
tributive Laws). 
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4. Ax0=Oand0xA=9%. 
5. (ANB) x (CND) =(AxC)N(Bx D). 


Proof. We will prove Part (3), leaving the rest to the reader in Exercise 3.3.8. 


(3). We will prove A x (BNC) = (A x B)N(A XC); the other equation can be 
proved similarly, and we omit the details. As usual, we will show that the sets on 
the two sides of the equation are subsets of each other. First, we show that (A x 
B)N(AxC) CA x (BNC). This part of the proof proceeds in the standard way. Let 
y € (Ax B)N(AxC). It would not be correct at this point to say that y equals some 
ordered pair (p,q), because (A x B) (A x C) does not have the form X x Y for some 
sets X and Y. We can say, however, that y € A x B and y € A x C. Using the former 
we deduce that y = (a,b) for some a € A and b € B. Because y € A x C, we then 
have (a,b) € Ax C. It follows that b € C. Hence b € BNC. Therefore y = (a,b) € 
A x (BNC). We deduce that (A x B)N (A x C) CA x (BNC). 

Next, we show that A x (BNC) C (A x B)N(A x C). In this part of the proof we 
take a slightly different approach than the one we have been using so far (though the 
standard method would work here too). By Lemma 3.2.4 (1) we know that A C A. 
Using the first sentence in Theorem 3.3.3 (1) we know that BNC C Band BNC CC. 
By Part (1) of this theorem we deduce that A x (BNC) CA x Band A x (BNC) C 
A x C. It now follows from the second sentence in Theorem 3.3.3 (1) that A x (BN 
C) C (Ax B)N(AxC). 

We conclude that A x (BNC) = (Ax B)N(AxC). 


Observe that A x B is not the same as B x A, unless A and B happen to be equal. 
The following example shows that the statement analogous to Part (5) of Theo- 
rem 3.3.12, but with U instead of NM, is not true. 


Example 3.3.13. Let A = {1,2} and B = {2,3} andC = {x,y} and D = {y,z}. First, 
just to see that it works, we verify that Theorem 3.3.12 (5) holds for these sets. We 
see that AN B = {2} and CTD = {y}, and so (ANB) x (CND) = {(2,y)}, and that 
AxC = {(1,x),(1,y), (2,x),(2,y)} and B x D = {(2,y), (2,z), (3,¥), (3,z)}, and so 
(A x C)N (Bx D) = {(2,y)}. Hence (ANB) x (CND) = (Ax C)N (Bx D). 

Now replace M with U in the above calculation. We then have AUB = {1,2,3} 
and CUD = {x,y,z}, and so (AUB) x (CUD) = {(1,x),(1,y), (1,z), (2,4), (2,9), 
(2,z),(3,x), (3,¥), (3,z)}. Using A x C and B x D as calculated in the previous para- 
graph, we see that (A x C)U(B x D) = {(1,x), (1, y), (2,x), (2,y), (2,2), (3,y), (3,z)}- 
Therefore (AUB) x (CUD) 4 (A x C) U(B x D). The difference between the situa- 
tion in this paragraph and the previous one can be seen schematically in Figure 3.3.4, 
which is not a Venn diagram, and where we need to think of A, B, C and D as subsets 
of R. © 


Exercises 


Exercise 3.3.1. Let A = {1,3,5,7} and B = {1,2,3,4}. Find each of the following 
sets. 
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(1) AUB. (4) A—B. 
(2) ANB. (5) B—A. 
(3) AXB. 


Li s 

> = 

x 

by S 
= 

| |s 


Fig. 3.3.4. 


Exercise 3.3.2. Let C = {a,b,c,d,e, f} and D = {a,c,e} and E = {d,e, f} and F = 
{a,b}. Find each of the following sets. 


(1) C—(DUE). (4) FA(DUE). 
@) (C=D)UE. (5) (FAD) UE. 
(3) F—(C—E). (6) (C—D)U(FNE). 


Exercise 3.3.3. Let X = [0,5) and Y = [2,4] and Z = (1,3] and W = (3,5) be inter- 
vals in IR. Find each of the following sets. 


(1) YUZ. (4) XxW. 
(2) ZW. (5) (XNY)UZ. 
(3) Y—W. (6) X —(ZUW). 


Exercise 3.3.4. Let 


G={n€ Z|n= 2m for some me Z} 
H = {n€ Z|n=3k for some k € Z} 
I1={n€Z|n’ is odd} 
J={ne€Z|0<n< 10}. 


Find each of the following sets. 
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(1) GUI. (4) J—G. 
(2) Go. (5) [-H. 
(3) GOH. (6) JO(G—H). 


Exercise 3.3.5. Given two sets A and B, are the sets A— B and B —A necessarily 
disjoint? Give a proof or a counterexample. 

Exercise 3.3.6. [Used in Theorem 3.3.3.] Prove Theorem 3.3.3 (1) (2) (3) (6) (7) (8) 
(9). 

Exercise 3.3.7. [Used in Theorem 3.3.8.] Prove Theorem 3.3.8 (1) (2) (3) (4) (5) (6). 
Exercise 3.3.8. [Used in Theorem 3.3.12.] Prove Theorem 3.3.12 (1) (2) (4) (5). 
Exercise 3.3.9. [Used in Theorem 7.6.7.] Let A and B be sets. Prove that (AUB) —A = 
B—(ANB) 

Exercise 3.3.10. [Used in Theorem 6.3.6.] Let A, B and C be sets. Suppose that 
CC AUB, and that CNA = @. Prove that C C B. 


Exercise 3.3.11. Let X be a set, and let A,B,C C X be subsets. Suppose that AN B = 
ANC, and that (X —A) NB = (X —A)NC. Prove that B=C. 


Exercise 3.3.12. Let A, B and C be sets. Prove that (A — B) NC = (ANC) —B= 
(ANC) —(BNC). 


Exercise 3.3.13. [Used in Exercise 6.5.15.] For real numbers a, b and c, we know that 
a—(b—c) =(a—b)-+c. Let A, Band C be sets. 


(1) Suppose that C C A. Prove that A— (B—C) = (A—B)UC. 

(2) Does A — (B—C) = (A—B) UC hold for all sets A, B and C? Prove or give 
a counterexample for this formula. If the formula is false, find and prove a 
modification of this formula that holds for all sets. 


Exercise 3.3.14. Let A and B be sets. The symmetric difference of A and B, denoted 
A AB, is the set A A B= (A—B)U(B—A). 
Let X, Y and Z be sets. Prove the following statements. 


(l) XA0=X. (4) X A(Y AZ) =(X AY) AZ. 
(2) XAX=0. (5) XA(Y AZ) =(XNY) A (XNZ). 
(3) XAY=YAX. (6) X AY =(XUY)—(XNY). 


Exercise 3.3.15. Prove or find a counterexample to the following statement. Let A, 
B and C be sets. Then (A — B) UC = (AUBUC) — (ANB). 


Exercise 3.3.16. Prove or find a counterexample to the following statement. Let A, 
Band C be sets. Then (AUC) — B = (A—B) U(C—B). 


Exercise 3.3.17. Let A, B andC be sets. Prove that A C C if and only if AU(BNC) = 
(AUB)NC. 
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Exercise 3.3.18. Prove or give a counterexample for each of the following state- 
ments. 


(1) Let A and B be sets. Then P(A UB) = (A) UP(B). 
(2) Let A and B be sets. Then P(ANB) = P(A) NP(B). 


Exercise 3.3.19. Let A, B and C be sets. Prove that A x (B—C) =(Ax B)—(AxC). 


Exercise 3.3.20. Let A and B be sets. Suppose that B C A. Prove thatA x A—Bx B= 
[((A — B) x A] U[A x (A—B)]. 


Exercise 3.3.21. Let A and B be sets. Suppose that A # B. Suppose that E is a set 
such that A x E = Bx E. Prove that EF = @. 


Exercise 3.3.22. Let X be a set. Suppose that X is finite. Which of the two sets 
P(X x X) x P(X x X) and P(P(X)) has more elements? 


3.4 Families of Sets 


So far we have dealt with unions and intersections of only two sets at a time. We now 
want to apply these operations to more than two sets. 

For the sake of comparison, let us look at addition of real numbers. Formally, 
addition is what is called a binary operation, which takes pairs of numbers as input 
and produces single numbers as output. We will see a rigorous treatment of binary 
operations in Section 7.1, but for now it is sufficient to take an informal approach 
to this concept. In particular, we see that in principle it is possible to add only two 
numbers at a time. Of course, in practice it is often necessary to add three or more 
numbers together, and here is how it is done. Suppose that we want to compute 
2+3-+9. We would proceed in one of two ways, either first computing 2+ 3 = 5 
and then computing 5+ 9 = 14, or first computing 3+ 9 = 12 and then computing 
2+12= 14. As expected, we obtained the same answer both ways, and this common 
answer is what we would call the sum of the three numbers 2 + 3 + 9. Another way 
of writing these two ways of computing the sum is as (2+3) +9 and 2+4 (34 9). 
It turns out that there is a general rule about addition, called the Associative Law, 
that says that in all cases of three numbers that are being added, the same result is 
obtained from either way of positioning the parentheses. This property of addition is 
stated in Theorem A.1 (1) in the Appendix. Hence, for any three numbers a, b and c, 
we can define the sum a+ b+ c to be the number that results from computing either 
(a+b)+cora+(b+c). 

Intuitively, a similar approach would work for the sum of any finite collection 
of numbers, though to do so formally would require definition by recursion, a topic 
we will see in Section 6.4; see Example 6.4.4 (2) for the use of recursion for adding 
finitely many numbers. Sums of infinite collections of numbers are much trickier. 
The reader has most likely encountered the notion of a series of numbers, for example 
hae as in a calculus course. Not all such series actually add up to a real number, 
and the question of figuring out for which series that happens is somewhat tricky, 
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especially if done rigorously, because it involves limits; see any introductory real 
analysis text, for example [Blo11, Chapter 9], for details. 

Let us now compare the above discussion of the addition of numbers to unions 
and intersections of sets. For the union and intersection of three sets, the exact analog 
holds because Theorem 3.3.3 (4) for union and intersection of sets is the exact analog 
of Theorem A.1 (1) for addition and multiplication of numbers. Hence, if we are 
given three sets A, B and C, and we wanted to form the union of all three of them, we 
could compute either one of (AUB)UC and AU (BUC), which are always equal, and 
we could label the result as AU BUC. The same idea holds for the intersection of three 
sets. In principle, we could extend this idea to unions and intersections of any finite 
collection of sets Aj,A2,...,An, where the word “finite” refers only to the number of 
sets, not the sizes of the individual sets, which could be infinite. Once again, to make 
such a definition work rigorously, we would need to use definition by recursion, and 
so we cannot do it properly yet. As for the union of an infinite sequence of sets 
A,,A2,A3,..., there is no simple analog for sets of series of numbers, and even if 
there were, series of numbers are rather tricky, and presumably series of sets would 
be too. 

Fortunately, we can solve this problem for unions and intersections of sets in a 
very simple way that is not available to us with addition of numbers. For addition, we 
really do not have a choice but to add two numbers at a time, and then extend that by 
recursion to finite sums, and use limits for infinite sums. For unions and intersections, 
however, rather than defining unions and intersections of arbitrary collections of sets 
in terms of unions and intersections of two sets at a time, we can define unions and 
intersections of arbitrary collections of sets from scratch, using the case of two sets 
at a time simply by way of analogy. 

For two sets A; and Az, the union A; UA? is the set of all elements x such that 
x € A; or x € Az. We cannot directly generalize the notion of “or” directly to an 
infinite collection of sets, because “or” is also defined for only two things at a time, 
but let us look at Aj UA? slightly differently. Recall that for mathematics, the word 
“or” always means the inclusive or, that is, one or the other or both. Hence, instead 
of thinking of A; UAz2 as the set of all elements x such that x € Aj or x € Az, we can 
just as well think of it as the set of all elements x such that x € A; for some i € {1,2}. 
In other words, we have replaced the use of “or” in the definition of the union of two 
sets with an existential quantifier. The advantage of this approach is that whereas “or” 
cannot be generalized to more than two things at a time, the existential quantifier can 
be used on sets of arbitrary size. Hence, if we have an infinite collection of sets 
A ,A2,A3,..-., we can define the union of these sets as the set of all elements x such 
that x € A; for some i € N. Using notation that is analogous to the notation for series 
of numbers, we then write 


| JAi = 41 UA2UA3U... = {x |x € An for some n € N}. 
i=l 


Now let us look at intersections. For two sets A; and Ag, the intersection A; MA2 
is the set of all elements x such that x € A; and x € Ag. The alternative approach is to 
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think of A; NA? as the set of all elements x such that x € A; for all i € {1,2}. Here we 
have replaced the use of “and” in the definition of the intersection of two sets with 
a universal quantifier. If we have an infinite collection of sets A;,A2,A3,..., we can 
define the intersection of these sets as the set of all elements x such that x € A; for 
some i € N, and we write 


(Ai =A1NA2 AZM... = {x | x © An for all n € N}. 
i=1 


Example 3.4.1. 

(1) For each i EN, let B; = {1,2,...,3i}. Then U2, Bj = N and (\2, B; = 
{1,2,3}. 

(2) Recall the notation for intervals in R in Definition 3.2.1. For each k € N, let 
Fy = (4,8 + 2). Then Ug, Fe = (0,11) and Ny Fe = (1, 8]. © 


What we have said so far, though correct, is not sufficient for our purposes. Sup- 
pose, for example, that for each real number x we define the set Q, to be the set of all 
real numbers less than x, so that Q, = (—°,x). Though it is not obvious, and it will 
only be proved in Section 6.7, it turns out that there is no possible way to line up all 
the sets of the form Q, in order analogously to A;,A2,A3,.... We are therefore not in 
precisely the same situation as discussed previously. However, in contrast to series of 
numbers such as )_, +, where the definition of the sum depends very much upon 
the order of the numbers in the series, for unions and intersections of sets the order 
of the sets does not matter at all. In particular, if we look at the definitions of Uj, Ai 
and ();-.,A;, we observe that we do not need to think of the sets Aj,A2,A3,... as 
written in order, and we can think of this collection of sets as having one set for each 
number in N. We can therefore rewrite Uj, Ai and (\j, Ai as Ujew Ai and (jen Ai, 
respectively. 

The following definition, based upon the above ideas, will allow us to define 
unions and intersections in the most general situation possible. We note that this def- 
inition is based upon the informal distinction between sets and element; we will see 
a different approach when we discuss the Zermelo—Fraenkel Axioms for set theory 
in Section 3.5. 


Definition 3.4.2. Let 4 bea set. The set .4 is called a family of sets if all the elements 
of 4 are sets. The family of sets 4 is indexed by /, denoted 4 = {A;};_;, if there is 
a non-empty set J such that there is an element A; € 4 for each i € J, and that every 
element of .4 equals A; for exactly one i € J. A 


Observe that the empty set is a family of sets. When we define a family of sets, if 
we do not need to view the family of sets as indexed, we will write “let 4 be a family 
of sets.” If we want to use an indexed family of sets, we will write “let J be a set, and 
let {Aj},-, be a family of sets indexed by /’”; in such cases we will often not give the 
family of sets a name such as 4. 

Although it is often easier to think of, and work with, families of sets when they 
are indexed, it is important for various applications to have the non-indexed way of 
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working with families of sets as well. For example, suppose that we have a set A, 
and we want to consider the family of all finite subsets of A; we could write such a 
family as 

A= {B| BCA and Bis finite}. 


(We have not formally defined finiteness yet, but the above example is just for illus- 
trative purposes; we will see the definition of finite sets in Section 6.5.) It is quite 
natural to consider such families of sets in many parts of mathematics (not just col- 
lections of finite subsets, but subsets characterized by other criteria as well), and 
there is no natural way to index the elements of such a family of sets. Actually, that 
is not quite true—we can index each element of 4 by itself! That is, we can write 
A= {Ax}yeg, which would lead us to think of any family of sets as “self-indexed.” 
However, while that is technically correct, in practice viewing every family of sets as 
self-indexed is not particularly helpful, and so we will continue to think of families 
of sets written as 4 as non-indexed, and families of sets written as {A;},-, as indexed. 
In our discussion of families of sets in this section, and our use of them in subsequent 
sections, we will use both indexed and non-indexed notation as suits each situation. 

On the one hand, families of sets are just sets, and hence everything that we 
have previously said about sets still holds for families of sets. For example, given 
two families of sets, we could ask whether one is a subset of the other. On the other 
hand, because all the elements of a family of sets are themselves sets, then we can do 
something special with families of sets, which is to take the union and intersection 
of all the elements of the family of sets, which we define as follows. 


Definition 3.4.3. Let 4 be a family of sets. The union of the sets in 4, denoted 
UxeqX, is defined as follows. If 4 4 0, then 


J X = {x|x €A for some A € 4}; 
Xea 


if A = 0, then UyeqX = 0. The intersection of the sets in 4, denoted (\yeqX, is 
defined as follows. If 4 4 0, then 


() X = {x|x€A forall A € 4}; 
XEA 


if A =9, then (lye 4X is not defined. 
If A = {A;}j<; is indexed by a set J, then we write 


JA = {x | x € A; for some i € T} and (Ai = {x | x € A; for all i € J} 
ie] ie] 


to denote the union and intersection of the sets in .4, respectively. A 


Intuitively, the set U;_,Ai is the set that contains everything that is in at least one 
of the sets A;; the set ();-;A; is the set containing everything that is in all of the sets 
Aj. The same holds for the non-indexed notation. 

Formally, the proper way to describe Uy. 4X is as “the union of the elements of 
the family of sets 4,” but informally we simply say “the union of the sets in 4” or 
“the union of the family of sets .4,” and similarly for intersection. 
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Example 3.4.4. 


(1) For each x € R, let C, be the interval C, = [—2,sinx]. Then U,<p Cy = [—2, 1] 
and (),ep Cx = [—2,—1]. 

(2) Let F¥ be the family of all finite subsets of N. Then UyeyX = N and 
NxegX = 90. ©) 


The following theorem gives some of the standard properties of unions and in- 
tersections of arbitrary families of sets, generalizing various properties we saw in 
Section 3.3. Part (1) of the theorem says that ();-,Aj is the largest set contained in 
all the sets in {A;},-,, and Part (2) of the theorem says that U);-; A; is the smallest set 
containing all the sets in {A;},-;. To allow the reader to gain familiarity with both the 
indexed and the non-indexed notations, we state the theorem in both forms, proving 
one part of the theorem using one notation, and another part of the theorem using the 
other notation. Subsequent theorems will be stated in only one of these two styles 
(usually the indexed notation), leaving it to the reader to convert it to the other style 
as needed. 


Theorem 3.4.5. 
Non-Indexed Version: Let A be a non-empty family of sets and let B be a set. 


© DxeaX CA for all A € A. IfB CX for all X € A, then BC (\yeqX 
» AC UyegX forallA € A. IfX CB for all X € A, then UyegX CB. 
1 (UxeaX) = Uxeg(BNX) (Distributive Law). 
U(AxeaX) =Nxeag(BUX) (Distributive Law). 
B—(UxeqX) =Myxeag(B-X) (De Morgan’s Law). 
—(AveaX) =Uxea(B—X) (De Morgan’s Law). 


Indexed Version: Let I be a non-empty set, let {Aj} jc, be a family of sets indexed by 
T and let B be a set. 


Aw R WN 


L. (jet Ai © Ag for all kk € I. If BCA, for all k €I, then BC ()jesAi- 
2. Ax © UjeAi for all kk € I. If Ax C B for all k € I, then Uje, Ai C B. 
3. BO (UjerAi) = Uje(BOAi) (Distributive Law). 

4. BU(MjerAi) =Nier(BUAi) (Distributive Law). 

5. B= (UjerAi) =Nicr(B—Ai) (De Morgan’s Law). 

6. B—((jcrAi) = Uie(B—Ai) (De Morgan’s Law). 


Proof. We will prove Parts (3) and (6), leaving the rest to the reader in Exercise 3.4.3. 


(3). Let x € BN (UjeAi). Then x € B and x € Uje,Ai. It follows that x € A, for 
some k € I. Hence x € BM Ag. Therefore x € Uj-7(BMAij) by Part (2) of this theorem. 
Hence BN (UjeAi) © Ujer(B MAI). 

Now let y € Uje7(BNAi). Then y € BNA; for some j € J. Hence y € Band y € Aj. 
Therefore x € Uje,A; by Part (2) of this theorem. It follows that y € BM (Uje7Ai). 
Hence Uje7(BNAi) © BN (UjerAi)- 

We conclude that BN (Uje7Ai) = Ujes(BNAi). 
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(6). Leta € B—((\yeqX). Then a € Banda €¢ (\xyeqX. Then a ¢ Y for some 
Y € A. Thena € B—Y. Hence a € Uyeg(B—X) by Part (2) of this theorem. It follows 
that B— (QyeaX) © Uxea(B-X). 

Now let b € Uyeq(B —X). Then b € B—Z for some Z € A. Then b € B and 
b¢Z. Hence b €(\yeqX. It follows that b € B—()yeqX. Therefore Uyeg(B-X) C 
B— (MxeaX). 

We conclude that B— (NyeqX) = Uveq(B—-X). 


It can be verified that all the parts of Theorem 3.4.5 that involve union but not 
intersection hold also when 4 = 9; the parts of the theorem that involve intersection 
are not defined when 4 = 0. 

It is interesting to compare the proof of Theorem 3.4.5 (3) with the proof of The- 
orem 3.3.3 (5). Though Theorem 3.4.5 (3) is a generalization of Theorem 3.3.3 (5), 
the proof of the generalized statement is slightly more concise than the proof of the 
simpler statement. The proof of Theorem 3.4.5 (3) is more concise precisely because 
it is phrased explicitly in terms of quantifiers, which allows us to avoid the need for 
cases as in the proof of Theorem 3.3.3 (5). 

In addition to defining the union and intersection of families of sets, it is also 
possible to form the product of a family of sets, though doing so requires the use of 
functions, and hence we will wait until the end of Section 4.5 for the definition. 


Exercises 


Exercise 3.4.1. In each of the following parts, we are given a set B, for each k EN. 
Find Uren Br and Oen Be. 


(1) By = {0,1,2,3,...,2k}. (4) By = [-1,3+ f]U[5, 4). 
(2) By = {k—1,k,k+ 1}. (5) By = (—$,1)U (2, He}. 
(3) By = (2, #2) {10+4}. (6) By = (0, 4} u[7, 4). 


Exercise 3.4.2. In each of the following parts, you need to find a family of sets 
{Ex} pen such that E, C R for each k € N, that no two sets E; are equal to each other 
and that the given conditions hold. 


(1) User Ze = [0,2°) and Aen Ex = [0,1]. 

(2) Uren Ex = (0,00) and Qen Ex = 9. 

(3) Usen Ex = Rand (en Ex = {3}. 

(4) Uren Ex = (2,8) and (\,en Ex = [3,6]. 

(5) Uren Ek = (0,0) and Qyen Ee = {1} U (2,3). 

(6) Use: SZ and (iene fa —2,0,2,4,6 lh 
(7) Uren Fx = R and (Jen Ex = N. 


Exercise 3.4.3. [Used in Theorem 3.4.5.] Prove Theorem 3.4.5 (1) (2) (4) (5). Do 
some in the indexed notation and some in the non-indexed notation. 


Exercise 3.4.4. Let 4 and B be non-empty families of sets. Suppose that .4 C B. 
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(1) Prove that UyeqgX C UvyesY. 
(2) Prove that QyeqX CflyeaY. 


Exercise 3.4.5. Let J be a non-empty set, and let {A;},-, and {B;},<,; be families of 


sets indexed by J. Suppose that A; C B; for alli € I. 


iel 
(1) Prove that Uj<7Ai © Uje; Bi. 
(2) Prove that )je;Ai © (J je7 Bi. 

Exercise 3.4.6. Let 4 be a non-empty family of sets and let B be a set. 


(1) Prove that (UyeqX) — B= Uyea(X —B). 
(2) Prove that (QyeqX) —B=(yeq(X —B). 


Exercise 3.4.7. Let J be a non-empty set, let {A;},-, be a family of sets indexed by 


I and let B be a set. 


(1) Prove that B x (Uje7Ai) = Uje7(B x Ai). 
(2) Prove that B x (Qjc7Ai) = Nic (B X Ai). 


iel 


Exercise 3.4.8. Suppose that W/ is some property of subsets of IR (for example, being 
finite). A subset X C R is called co-W if R—X has property W. 

Let 4 be anon-empty family of sets. Suppose that X is a co-W subset of R for all 
X € A. For each of the properties W listed below, either prove that Uye aX is co-W, 
or give a counterexample. Try to figure out a general rule for deciding when Uye 4X 
is co-W for a given property W. 


(1) A subset of R has property W if and only if it is finite. 

(2) A subset of R has property W if and only if it has at most 7 elements. 

(3) A subset of R has property W if and only if it has precisely 7 elements. 

(4) A subset of R has property W if and only if it contains only integers. 

(5) A subset of R has property W if and only if it is finite, and has an even 
number of elements. 


3.5 Axioms for Set Theory 


Set theory is a very remarkable idea that works so very well, and is so broadly useful, 
that it is used as the basis for modern mathematics. Unfortunately, however, it does 
not work quite as nicely as we might have made it appear in the previous sections of 
this chapter. Early in the development of set theory, a number of “paradoxes” were 
discovered, the most well-known of which is Russell’s Paradox, which is as follows. 
Suppose that we could form the set of all sets; let S denote this set. Observe 
that S € S. We then define the set T= {A € S|A ¢ A}. Is T a member of itself? 
Suppose first that T ¢ T. Then T € T. Now suppose that T € T. Then T ¢ T. There 
is something wrong here. The problem is that we are trying to use a set of all sets, and 
more generally the problem is that we have to be more careful how we quantify over 
sets. See [GG94, Section 5.3] for further comments on the paradoxes of set theory. 
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In our use of sets in this text, as well as in the use of sets in much of mathematics, 
problems such as Russell’s Paradox do not arise because we do not use the set of all 
sets and similar problematic constructs. To treat set theory rigorously, however, some 
subtlety is needed. Various axiom systems for set theory have been developed that 
avoid paradoxes such as Russell’s Paradox. See [Vau95, Introduction] for a succinct 
discussion of the history of set theory and its axiomatization. 

The first axiom scheme for set theory was due to Zermelo in 1908. This scheme 
was subsequently modified into what is now the most commonly used axiom system 
for set theory, which is referred to as the Zermelo—Fraenkel Axioms. These axioms 
are often abbreviated as “ZF.” There are a number of equivalent variations of ZF, 
of which we state one below. See [End77] or [Sto79, Chapter 7] for an accessible 
discussion of the ZF axioms, and see [LevO2] for a more advanced look at these 
axioms. 

The ZF axioms are properly formulated in the context of symbolic logic, in which 
case the axioms are written out in logical notation. Because our purpose here is just 
to gain an informal familiarity with the ZF axioms, and because it would take us too 
far afield to develop the needed logic, we will write the axioms informally (though 
we will write the first one in logical symbols just to show that it can be done). 

We need two additional comments before listing the axioms. First, whereas in- 
formally we tend to distinguish between sets and elements, for example we think 
of A = {1,2} as a set and each of 1 and 2 as elements, in the ZF axioms we make 
no such distinction. Everything in the ZF axioms is a set. It might seem strange to 
think of the numbers | and 2 as sets, but from the perspective of the ZF axioms the 
symbols “1” and “2” denote the sets {0} and {0,{@}}, respectively. (We will say a 
bit more about this idea shortly.) 

Once we assume that everything in the ZF axioms is a set, then the relation of 
elementhood, which is denoted by the symbol €, is a relation between sets. That is, 
given two sets x and y, it might or might not be the case that x € y. However, even 
if x € y, we still think of both x and y as sets, regardless of whether or not one is 
an element of the other. Of course, if everything is a set, and we do not distinguish 
between what is a set and what is an element, we have to worry about potentially 
problematic constructions such as x = {x}, which would not specify what the set x 
is, because x is defined in terms of itself. Fortunately, the ZF axioms are designed 
to prevent such problems; this particular problem is disallowed by the Axiom of 
Regularity. 

Of course, if everything in the ZF axioms is viewed as a set, then the concept of a 
“family of sets” as discussed in Section 3.4 is unnecessary, because every non-empty 
set is a family of sets. Nonetheless, for the informal approach to sets used on a daily 
basis in modern mathematics, and used throughout this text (other than during our 
discussion of the ZF Axioms), the distinction between sets and elements, and the 
notion of a family of sets, are quite useful, and we will continue to make use of these 
ideas. 

Second, because the ZF axioms are formulated in terms of formal logic, and 
because we are not discussing such such logic, our informal treatment of the ZF 
axioms will necessarily be, indeed, informal. See [EFT94] and [Mal79] for more 
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about logic. In particular, whereas most of the axioms can be stated in terms of 

familiar logical notions (for example, “and,” “or,’ “not” and “for all”), which we 

saw informally in Sections 1.2 and 1.5, two of the axioms (the Axiom of Selection 

and the Axiom of Replacement) use the concepts of logical properties of sets, which 

require a more extensive treatment of logic than we have the ability to provide here. 

Hence, our phrasing of these two axioms is, unfortunately, not entirely satisfactory. 
Here, finally, are the ZF axioms. 


Axiom of Extensionality Let x and y be sets. If x and y have the same elements, 
then x = y. 


This axiom is simply another way of stating the definition of the equality of sets 
that we saw in Definition 3.2.5. This axiom can be written in logical notation as 


VaVy(Ve(z Exo zey)>x=y). 


We will not write the other axioms in this type of notation, but they all can, and 
should, be written that way in the context of a more detailed look at the axioms via 
the study of logic. 


Axiom of Empty Set There is a set z such that x ¢ z for all sets x. 


This axiom is also referred to as the Axiom of Null Set. By the Axiom of Ex- 
tensionality there is only one set z as described in this axiom, and this set is usually 
denoted 0. 


Axiom of Pairing Let x and y be sets. There is a set z such that w € z if and only 
ifw=xorw=y. 


The set z described in the axiom is unique by the Axiom of Extensionality, and it 
is denoted {x,y}. In this axiom it is not required that x £ y, and we abbreviate {x,x} 


by {x}. 


Axiom of Union Let x be a set. There is a set z such that w € z if and only if there 
is some y € x such that w € y. 


Once again the set z described in the axiom is unique, and it is denoted Ux. 
Because the ZF axioms allow us to view every set as a family of sets, then Ux is the 
same as what we informally defined as L,-,y in Section 3.4. 


Axiom of Power Set Let x be a set. There is a set z such that w € z if and only if 
wCx. 


The notation “w C x” is simply an abbreviated way of writing the expression 
“y € w implies y € x,” so that it is valid to use that notation in the ZF axioms. Once 
again the set z described in this axiom is unique, and it is the same as what we 
informally defined as (x) in Section 3.2. 


Axiom of Regularity Let x be a set. Suppose that x ~ 0. Then there is some y € x 
such that xNy = @. 
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This axiom is also referred to as the Axiom of Foundation. The notation “xy = 
@” is simply an abbreviated way of writing the expression “there does not exist a set 
z such that z € x and z € y,” so that it valid to use that notation in the ZF axioms. This 
axiom is needed to rule out problematic situations such as a non-empty set x such 
that x € x. To see why x € x is not allowed when x ¥ @, observe that if there were 
such a set x, then the set {x} would violate the Axiom of Regularity. 


Axiom of Selection Let P(t) be a logical property of sets with one free variable r 
that can be formulated in the context of the ZF axioms. Let x be a set. Then 
there is a set z such that y € z if and only if y € x and P(y) is true. 


This axiom has a variety of names, including the Axiom of Specification, Axiom 
of Comprehension and Axiom of Separation. The set z described in the axiom is 
unique, and it is usually denoted {y € x | P(y)}. It is very important to observe that 
this axiom states that we can take an existing set, and then form the subset of those 
elements that satisfy the given property. It is not possible to define a set of elements 
that satisfy a given property if it is not specified what set the elements belong to. 
Consider, for example, the definition of the union and intersection of a family of sets 
given in Definition 3.4.3, which said 


JA; = {x |x € A; for someie I} and ()A;= {x |x €A; for all i € J}. 

il icl 
In fact, Definition 3.4.3 is not valid as stated if we adhere to the Axiom of Se- 
lection, because we are not specifying which set the element x belongs to. It 
would be tempting to write something such as “let S be the set of all sets, and let 
UierAi = {x € S| x € A; for some i € J}, but that would not be valid, because we saw 
at the start of this section that the set of all sets is not a concept we can use. The 
reader might, quite reasonably, be troubled that we used a definition in Section 3.4 
that was not technically valid, but no real harm was done (other than perhaps to the 
credibility of the author). Definition 3.4.3 conveys the correct intuitive idea behind 
the union and intersection of a family of sets, and given that we had not yet dis- 
cussed the ZF axioms, it was the best we could do at the time. Moreover, the ZF 
axioms allow for a rigorous treatment of union and intersection, and this rigorous 
approach works precisely as did the intuitive approach used in Definition 3.4.3, so 
we can confidently continue to use union and intersection just as we have until now. 

Interestingly, the way the ZF axioms treat union is quite different from the way it 
treats intersection. The ability to take the union of a family of sets is given axiomat- 
ically in the Axiom of Union; by contrast, the ability to take the intersection of a 
family of sets can be deduced from the ZF axioms, as follows. Let x be a non-empty 
set. Then (),<,z is then defined to be {y € Ux | y € z for all z € x}, which is possible 
by the Axiom of Selection. 

Although we used the name “Axiom of Selection,” this axiom is actually not a 
single axiom, but a collection of axioms, one axiom for each property P(t). Such a 
collection of axioms is often called an “axiom schema.” It is not possible to coalesce 
this collection of axioms into a single axiom, because it is not possible in the ZF 
axioms to quantify over all possible properties of sets. 
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Axiom of Infinity There is a set z such that 0 € z, and if x € z then xU {x} € z. 


We can make use of @ and U in the Axiom of Infinity because of the Axiom of 
Empty Set and the Axiom of Union, and we can make use of {x} because of the 
Axiom of Power Set and the Axiom of Selection, though we omit the details of the 
latter. The set z in this axiom, which is not necessarily unique, can be thought of 
informally as a set that contains the sets 


0,{O},{0, {O}}, {0, {0}, {0,{O}}},.... (3.5.1) 


Intuitively, any such set z must be infinite, which leads to the name of the axiom. (We 
have not yet formally defined what it means for a set to be infinite; we will see that 
in Section 6.5.) 


Axiom of Replacement Let F(s,t) be a functional property of sets with two free 
variables s and ¢ that can be formulated in the context of the ZF axioms. Let x 
be a set. Then there is a set z such that y € z if and only if there is some w € x 
such that F(w,y) is true. 


As with the Axiom of Selection, the Axiom of Replacement is also a collection 
of axioms, one for each functional property F. A functional property is the formal 
way of describing what is standardly called a function. We will discuss functions 
extensively in Chapter 4; in particular, we will see in Section 4.1 that functions can 
be described in terms of sets, and hence functional properties are valid in the ZF 
axioms. We will put off any further discussion of functions till Chapter 4. The Axiom 
of Replacement is used primarily for technical purposes in advanced set theory, and 
we will not discuss it any further. See [Pot04, Appendix A.3] for some philosophical 
reservations about the Axiom of Replacement, though other mathematicians do not 
seem to have qualms about this axiom. 

The ZF axioms can be used not only to prove many useful facts about sets, but 
also to construct many familiar mathematical objects, for example the set of natural 
numbers. The basic idea of this construction is found in the list of sets in Equa- 
tion 3.5.1. These sets should remind the reader of the numbers 0,0+1,0+1+1,0+ 
1+1+41,..., which intuitively are the non-negative integers; the natural numbers 
are then obtained by removing 0 from this set. The Axiom of Infinity guarantees the 
existence of at least one set w that contains these “numbers,” though the set w is not 
necessarily unique. We then let z be the intersection of all subsets of w that contain 
these “numbers,” and it can be seen that z is then the minimal such set. With some 
work, the set z is seen to contain precisely the sets listed in Equation 3.5.1, and is 
seen to behave just as one would expect the set of natural numbers together with 0 to 
behave; the choice of w turns out not to matter. The details of this construction may 
be found in [End77, Chapter 4]. 

Once the natural numbers have been defined, it is possible to construct the in- 
tegers, the rational numbers and the real numbers from the natural numbers. See 
[Blol1, Chapter 1] for the construction of these number systems. Moreover, many 
branches of mathematics such as real analysis (the rigorous study of calculus) and 
Euclidean geometry are based upon the properties of the real numbers. Hence, if we 
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accept the ZF axioms, then we have at our disposal the familiar number systems with 
their standard properties, and a variety of branches of mathematics, all constructed 
completely rigorously. 

A review of the ZF axioms raises the question of why these particular axioms and 
not others were chosen. The answer is that it would be possible to use variants of the 
axioms. These particular axioms were chosen because they seem to be convenient to 
work with, and because they suffice to imply everything that needs to be done with 
sets. 

Is there any redundancy in the ZF axioms? In other words, is it possible to prove 
one or more of the axioms from the remaining ones? The answer is yes. For exam- 
ple, the Axiom of Infinity together with the Axiom of Selection imply the Axiom 
of Empty Set, because the Axiom of Infinity states that there is some set w, and the 
Axiom of Selection implies that we can define a new set z= {x € w| x # x}, which 
in turn satisfies the Axiom of Empty Set. Hence, in principle, it would be possible to 
drop the Axiom of Empty Set from the ZF axioms if we want to have the smallest 
possible set of axioms. However, the Axiom of Empty Set is used regularly through- 
out mathematics, and because it is so important, it is standard to include this axiom 
in the ZF axioms, even though keeping it is redundant. 

Similarly, though of a more technical nature, it is shown in [Lev02, Section I.5] 
that the Axiom of Selection can be proved using the Axiom of Replacement and other 
axioms, which means that the Axiom of Selection is redundant. In practice, however, 
the Axiom of Selection is used frequently throughout mathematics, whereas the Ax- 
iom of Replacement is not used nearly as often, so both axioms are included in the 
ZF axioms, the former to emphasize its usefulness, and the latter because it is needed 
for some technicalities. 

Are the ZF axioms consistent? That is, are we certain that if we deduce every- 
thing that can be deduced from the ZF axioms, we would never encounter a logical 
contradiction? The answer is that we cannot be completely sure. In general, if some- 
one starts with a set of axioms and deduces a specific logical contradiction, then 
we know that the set of axioms is inconsistent; on the other hand, if no one has yet 
produced a logical contradiction from a set of axioms, we cannot know if that is be- 
cause no logical contradiction can possibly be deduced, or if that is because there is 
a logical contradiction waiting to be found and it has just not been found yet. 

However, even if no one has definitively proved that the ZF axioms are consistent, 
we observe that these axioms have been designed to remove the known problems of 
naive set theory such as Russell’s Paradox. As discussed at the start of the section, 
Russell’s Paradox arises when we let S$ denote the set of all sets, and we then looked 
at the set T= {A € S| A € A}. Observe, however, that if S is the set of all sets, then 
S € S, and yet we saw that that was not possible in our discussion of the Axiom 
of Regularity. Without the existence of the set S, then we cannot use the Axiom of 
Selection to define the set 7, because that axiom does not allow definitions of the 
form T= {A|A¢ A}. 

Ultimately, the ZF axioms seem reasonable intuitively; they work well in provid- 
ing a framework for set theory as we would want it; the known problems with naive 
set theory have been eliminated by the ZF axioms; and experts in the field have not 
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found any new problems that arise out of these axioms. Hence, we can feel confident 
that the ZF axioms are a very reasonable choice as the basis for mathematics. We 
simply cannot do any better than that. 

In practice, most mathematicians who are not logicians use set theory in the in- 
formal and intuitive way that we saw in the previous sections of this chapter; most 
mathematicians accept the fact that set theory seems to work as it is supposed to, and 
do not worry about it beyond that. It is very good that there are logicians who make it 
their business to work out the foundations of mathematics, but most mathematicians 
want to prove theorems in their areas of interest (algebra, analysis, topology, combi- 
natorics, etc.), and spending time worrying about the subtleties of the ZF axioms and 
the like would be too large a distraction. That attitude is certainly recommended for 
the reader (except when you study logic or set theory in courses dedicated to those 
fields), and that is the approach we take in this text. Nonetheless, even if most math- 
ematicians do not explicitly think about the ZF axioms on a daily basis, it is well 
worth knowing that such an axiom system exists, and knowing roughly what it says. 

Although we do not recommend getting too caught up in the details of the ZF 
axioms at this point, there is one additional axiom for set theory with which it is 
worth spending more time, namely, the famous Axiom of Choice. In contrast to the 
axioms of ZF, which arouse little controversy and are used implicitly by most math- 
ematicians, the Axiom of Choice is thought by some to be controversial, and when 
used by mathematicians (and most do use it), it is used much more explicitly than 
the ZF axioms. The Axiom of Choice is often abbreviated as “AC,” and when ZF is 
combined with AC, the resulting collection of axioms is often abbreviated as “ZFC.” 

Intuitively, the Axiom of Choice states that if we have a family of non-empty 
sets, we can simultaneously choose one element from each of the sets in the family. 
For a single non-empty set, there is no problem choosing an element from the set. 
Indeed, we regularly say things such as “let A be a non-empty set, and let a € A.” For 
a finite family of non-empty sets, we can choose an element from the first set, and 
then an element from the second set, and so on, and we will be done after a finite 
number of steps. Again, there is no problem in making such choices. The problem 
arises when we have an infinite family of sets (particularly an uncountable family— 
uncountability will be defined in Section 6.5). From both a practical and a logical 
point of view, we cannot assume that it is possible to perform an infinite number of 
steps one at a time, and expect the process ever to be completed. In particular, we 
cannot choose one element from each set in an infinite family of non-empty sets by 
making the choices one at a time. If we want to choose one element from each set 
in an infinite family of non-empty sets, we need to make the choices simultaneously. 
Such a simultaneous choice is not something we could physically do, and the ability 
to do so mathematicially does not follow from the other axioms of set theory. There- 
fore, we need an additional axiom to guarantee our ability to make such choices, and 
that axiom is the Axiom of Choice. 

There are a number of equivalent variants of the Axiom of Choice; we use the 
following. The most convenient, and useful, way of phrasing the Axiom of Choice is 
with the help of functions, but because we have not defined functions yet, we use the 
following version that is stated strictly in terms of sets. We will restate the Axiom of 
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Choice using functions in Section 4.1. Although the problem with choosing elements 
occurs only in infinite families of sets, the Axiom of Choice is stated for all sets in 
order to avoid special cases. 

For the following version of the Axiom of Choice, recall that in the ZF axioms, 
all sets are viewed as families of sets. 


Axiom of Choice Let x be a set. Suppose that if y,w € x, then y 4 0 and ynw =90. 
Then there is a set z such that if y € x, then yz contains a single element. 


The set z in the Axiom of Choice contains one element from each set in x, and 
these elements can be thought of as having been “chosen” by z, which leads to the 
name of the axiom, though of course sets do not actually make choices. The re- 
quirement that if y, w € x, then y £ 0 and yw =O guarantees that every set in x has 
something in it that can be chosen, and that nothing in z could belong to two different 
sets in x, so that there is genuinely one element in z for each set in x. 

For practical applications, it is convenient to reformulate the Axiom of Choice in 
terms of families of sets indexed by a set. We start with the following definition. 


Definition 3.5.1. Let J be a non-empty set, and let {A;},_, be a family of sets indexed 
by J. The family of sets {A;},-, is pairwise disjoint if i, 7 € J and i A j imply that 
Aj NA; = 0. A 
We now restate the Axiom of Choice as follows. 

Axiom 3.5.2 (Axiom of Choice for Pairwise Disjoint Sets—Family of Sets Ver- 
sion). Let I be anon-empty set, and let {A;},-, be a family of non-empty sets indexed 
by I. Suppose that {A;}<, is pairwise disjoint. Then there is a family of sets {C;} 
such that C; © Aj and C; has exactly one element for alli € I. 


iel 


Axiom 3.5.2 has the requirement of pairwise disjoint sets because the original 
statement of the Axiom of Choice did. In practice, however, it is often necessary 
to apply this axiom to families of non-empty sets that are not necessarily pairwise 
disjoint. Fortunately, the following version of the Axiom of Choice, which does not 
assume pairwise disjoint sets, can be deduced from Axiom 3.5.2. A proof of that 
fact is left to the reader in Exercise 3.5.2. We name the following theorem “Axiom 
of Choice” without mentioning the fact that pairwise disjoint sets are not required 
because this version of the Axiom of Choice is the one that is commonly used, and 
it is the one which we will use subsequently. 


Theorem 3.5.3 (Axiom of Choice—Family of Sets Version). Let I be a non-empty 
set, and let {Aj},-, be a family of non-empty sets indexed by I. Then there is a family 
of sets (Cheep such that C; © A; and C; has exactly one element for alli € I. 


The Axiom of Choice is to be used only when there is no way to avoid it; that is, 
when we need to choose a single element from each set of a family of non-empty sets 
and when there is no explicit procedure for such a choice. This point was described 
amusingly by Bertrand Russell in [Rus19, Chapter 12] as follows. Suppose that a 
millionaire possesses an infinite number of pairs of boots, and an equal number of 
pairs of socks. If the millionaire wants to select one boot from each pair, he can 
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prescribe a specific method for doing so, for example by stating that the left shoe 
of each pair be chosen; in this case, the Axiom of Choice is not needed. On the 
other hand, if the millionaire wants to select one sock from each pair, he has no way 
to prescribe a specific method for doing so, because the two socks in each pair are 
indistinguishable; hence, for such a selection, an arbitrary choice must be made, and 
formally such a choice uses the Axiom of Choice (though Russell does not phrase it 
exactly that way). 

For a more mathematical example, suppose that we have a family {[a;, bi] };<, of 
closed bounded intervals in IR, and suppose that we wanted to choose an element 
from each interval. We would not need to use the Axiom of Choice in this case, 
because we could, for example, choose the smallest element of each interval, which 
is qj. 

One of the reasons we single out the Axiom of Choice from the other axioms of 
set theory is because there are some mathematicians who do not accept the Axiom 
of Choice. It turns out that the Axiom of Choice is independent of the other axioms 
of ZF, and hence it can either be accepted or not without having to change the other 
axioms. This independence, which is due in part to Kurt Gédel in 1938 and in part 
to Paul Cohen in 1963, means that if the ZF axioms are consistent, then so are the 
ZF axioms together with the Axiom of Choice, and so are the ZF axioms together 
with the negation of the Axiom of Choice. For more about the Axiom of Choice, see 
[Mos06, Chapter 8], and [Moo082], which has a very extensive historical discussion, 
and [Pot04, Chapter 14], which has some philosophical discussion. 

The author, and the majority of mathematicians, have no qualms about using the 
Axiom of Choice. Indeed, we will use the Axiom of Choice in a few places in this 
text, for example in the proof of Zorn’s Lemma (Theorem 3.5.6) later in this section, 
and in the proof of Theorem 4.4.5. However, because some mathematicians have 
reservations about the Axiom of Choice, this axiom should always be mentioned 
when it is used. 

In addition to using the Axiom of Choice directly, we will need a technical fact 
about sets that is known as Zorn’s Lemma, and that is equivalent to the Axiom of 
Choice. We start with the following definition. 


Definition 3.5.4. Let 2 be a non-empty family of sets. 


1. Let M € ®. The set M is a maximal element of ? if there is no Q € # such 
that M SQ. 
2. Let C C ®. The family C is a chain if A,B € Cimplies AC BorA DB. 


Intuitively, a chain in ? is a subset of # for which the elements can be lined up 
in order of inclusion. 

As we see in the following example, not every family of sets has a maximal 
element. In fact, the point of Zorn’s Lemma is that it gives a criterion that guarantees 
the existence of such an element. Observe that a maximal element of a family of 
sets need not be the largest element of the family; that is, it need not have all the 
other elements of the family as subsets. A maximal element is simply one that is not 
a proper subset of any other element of the family. Also, a family of sets can have 
more than one maximal element. 
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Example 3.5.5. 


(1) Let P = {{1}, {1,2}, {1,2,3},{5}}. Then @ has two maximal elements, 
which are {1,2,3} and {5}. There are nine chains in ?, which are 


O,{{1}}, {1,23}, 141,2,3} 5 (5h, 
{{1} 41,233, (1b (12,335, (1,23, 11,2, 3h, 
{{1},{1,2}, {1,2,3}}- 


(2) Let Q = P(N), let 5 denote the family of all finite subsets of N and let C = 
{{2}, {2,4}, {2,4,6},...}. Then C is a chain in each of Q and $. Clearly Uce-C = 
{2,4,6,...}. Then UcecC € Q, but UcecC ¢ S. Observe also that Q has a maximal 
element, which is N, whereas 5 has no maximal element, because any finite subset 
of N is a proper subset of many other finite subsets of N. o) 


In Example 3.5.5 (2) we observe that in Q, the union of the elements of the chain 
C is in Q, and that Q has a maximal element; in S, by contrast, neither of these facts 
holds. Zorn’s Lemma, which we now state, shows that the situation just observed 
is no coincidence. More specifically, Zorn’s Lemma says that if a family of sets 
contains the union of the elements of each chain in the family, then the family has a 
maximal element. 


Theorem 3.5.6 (Zorn’s Lemma). Let ® be a non-empty family of sets. Suppose that 
for each chain C in ®, the set UcecC is in P. Then ® has a maximal element. 


Proof. Suppose that ? does not have a maximal element. Then for every A € P, the 
set Ty = {Q € 2 | A S Q} is non-empty. Therefore {7 }4<» is a family of non-empty 
sets. By the Axiom of Choice (Theorem 3.5.3) there is a family of sets {F4} Acp such 
that F4 C T, and Fy has exactly one element for all A € @. For each A € ®, let S4 be 
the single element in Fy, and then S4 € ® and A g S4 for all A € P. 

Let R C ®. We say that the family ® is chain-closed if for each chain C in R, 
the set Uce-C is in ®. 

By hypothesis the family # is chain-closed. Let M be the intersection of all 
chain-closed families in #. Let C be a chain in ™. Then C is a chain in & for all 
chain-closed families R C ®, and hence UcecC € R for all chain-closed families 
R CP, and therefore UcecC € M. Hence ™ is chain-closed. 

Observe that @ is a chain in ™, and that UccgC = 0. Hence 0 € M, which means 
that MA 0. 

Let A € ®. Let A= {X € P|S, CX}. Then 4 C ®. Let C be a chain in 4. 
Then C is a chain in ®, and hence UcecC € # by hypothesis. If C € C then S4 CC, 
and hence S4 C UcecC. Therefore UceeC € A. It follows that 4 is chain-closed. 
Therefore M C 4. However, we note that A ¢ 4, because otherwise we would have 
S4 CA, which would contradict the fact that A G Sa. Hence A ¢ M. 

Because M C P, and because A ¢ ™ for all A € ®, we deduce that M = 0, which 
is a contradiction. We conclude that ? must have a maximal element. 
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The statement of Zorn’s Lemma in Theorem 3.5.6 is not the most general form 
of the Lemma. The most general version is stated in terms of partially ordered sets 
(also called posets), which we define in Section 7.4, rather than the more narrow 
context of set inclusion. Moreover, the most general version requires only that every 
chain has an upper bound, not necessarily a least upper bound; see Exercise 3.5.7 
for the definitions of these terms in the context of set inclusion, and Section 7.4 for 
the definitions for posets. However, even though our version of Zorn’s Lemma is not 
the strongest possible version, it suffices for our purposes, and it is easier to prove. 
Moreover, it turns out that our version of Zorn’s Lemma is actually equivalent to the 
more general version; see [RR85, Section I.4] for details. Hence, we name Theo- 
rem 3.5.6 “Zorn’s Lemma” without mentioning the fact that its statement is weaker 
than other versions. Also, we remark that Zorn’s Lemma is not really a lemma, but is 
rather a very important theorem; the name of this theorem is standard, however, and 
we will stick with it. 

In the proof of Zorn’s Lemma (Theorem 3.5.6) we used the Axiom of Choice 
explicitly by writing out the appropriate family of sets. In practice, however, for 
the sake of not overly burdening the reader with unnecessary details, most proofs 
involving the Axiom of Choice use that axiom in a less formal manner, by simply 
saying that we are choosing something, but without explicitly writing things out in 
terms of sets, and sometimes without even mentioning the Axiom of Choice at all. 
For example, a more typical way of defining S, in the proof of Zorn’s Lemma would 
be: “Suppose that ® does not have a maximal element. Then for every A € ?, there 
is some Q € # such that A g Q, and we let s4 = Q; if there is more than one such Q, 
we let s4 be any choice of such Q.” We will also use this informal style of invoking 
the Axiom of Choice when we use it subsequently. 

Would it have been possible to prove Zorn’s Lemma without invoking the Axiom 
of Choice? The answer is no, because Zorn’s Lemma is equivalent to the Axiom 
of Choice, by which we mean that if the ZF axioms are assumed together with the 
Axiom of Choice, it is possible to prove Zorn’s Lemma (as we have seen), and if 
the ZF axioms are assumed together with Zorn’s Lemma, it is possible to prove the 
Axiom of Choice (as will be seen in Exercise 4.1.11). Given that the Axiom of Choice 
is independent of the ZF axioms, which implies that it is not possible to deduce the 
Axiom of Choice from only the ZF axioms, it is therefore also not possible to deduce 
Zorn’s Lemma from only the ZF axioms. Hence, we cannot avoid using the Axiom 
of Choice, or something else equivalent to it, in the proof of Zorn’s Lemma. It is a 
matter of convenience—and choice—which of these two facts is taken as an axiom, 
and which is to be deduced. The Axiom of Choice is much more intuitively appealing 
than Zorn’s Lemma, and that is perhaps one of the reasons why the former is more 
often taken axiomatically. On the other hand, there are situations where Zorn’s is 
easier to use directly than the Axiom of Choice, for example in the proof of the 
Trichotomy Law for Sets (Theorem 6.5.13). 

Besides being of great mathematical use, the equivalence of the Axiom of Choice 
and Zorn’s Lemma also explains the following well-known joke (well-known among 
mathematicians, at least). Question: What is yellow and equivalent to the Axiom of 
Choice? Answer: Zorn’s lemon. 
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There are also a number of other important facts in mathematics that are equiv- 
alent to the Axiom of Choice, a few of which are the following. See [RR85] for an 
extremely extensive list of statements that are equivalent to the Axiom of Choice. 


1. The Trichotomy Law for Sets. See Theorem 6.5.13 for the statement of this 
theorem and a proof of it using Zorn’s Lemma, and see [Sto79, Section 2.9] 
or [RR85, Section I.3] for the other implication. 


2. The Well-Ordering Theorem. This theorem states that for any set, there is an 
order relation on the set that is well-ordered, which means that every subset 
has a least element. (An order relation is, informally, a relation that behaves 
similarly to the standard order on R; a formal definition of an order relation is 
given in Section 7.4, where it is called a “total ordering” to distinguish it from 
a “partial ordering.”) The standard order on N is well-ordered by the Well- 
Ordering Principle (Theorem 6.2.5), but the standard order on R is certainly 
not well-ordered. The Well-Ordering Theorem implies that in principle there 
is some other order on R that is well-ordered, though there does not appear 
to be a concrete description of such an order. A proof that the Well-Ordering 
Theorem implies the Axiom of Choice may be found in Exercise 7.4.18; a 
proof of the other implication may be found in [HJ99, Section 8.1] or [Sto79, 
Section 2.9]. 


3. If {A;};<, is a family of non-empty sets indexed by /, then the product []j<; Ai 
is not empty. See Section 4.5 for the definition of the product of a family of 
sets, and discussion of the equivalence of this fact with the Axiom of Choice. 


4. For any infinite set A, the set A x A has the same cardinality as A. See Sec- 
tion 6.5 for the definition of infinite sets, and the definition of two sets hav- 
ing the same cardinality. This result is certainly not true for finite sets. See 
[Sto79, Sections 2.9 and 2.10] or [RR85, Section I.7] for details. 


5. Any surjective function has a right inverse. See Section 4.4 for the definition 
of surjectivity, see Theorem 4.4.5 (1) for a proof that the Axiom of Choice 
implies this fact, and see Exercise 4.4.19 for the other implication. 


We conclude this section with two quotes illustrating the controversy and confu- 
sion surrounding the Axiom of Choice. 

As mentioned in Item (4) above, the statement “if A is an infinite set then A x A 
has the same cardinality as A” implies the Axiom of Choice. This fact is due to 
Alfred Tarski. In [Myc06] it is related: “Tarski told me the following story. He tried 
to publish his theorem ... in the Comptes Rendus Acad. Sci. Paris but Fréchet and 
Lebesgue refused to present it. Fréchet wrote that an implication between two well 
known propositions is not a new result. Lebesgue wrote that an implication between 
two false propositions is of no interest. And Tarski said that after this misadventure 
he never tried to publish in the Comptes Rendus.” It should be noted that Tarski, 
Fréchet and Lebesgue are all very important mathematicians of their era, and yet 
they had very different views about the Axiom of Choice. 
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Finally, the following widely cited quote, found among other places at [Schl], 
is due to Jerry Bona: “The Axiom of Choice is obviously true; the Well Ordering 
Principle is obviously false; and who can tell about Zorn’s Lemma?” This quote is 
amusing precisely because it captures how difficult it is to be sure that the Axiom of 
Choice is true, because even though it seems very appealing intuitively, it is known 
to be equivalent to statements that are much less self-evident. 


Exercises 


Exercise 3.5.1. For each of the following families of intervals in R, suppose that we 
wanted to choose an element from each interval simultaneously. Would we need to 
use the Axiom of Choice? 


(1) Let {(a;,b;)},<,; be a family of non-degenerate open bounded intervals in R. 
(2) Let {(ci,c¢)},<,; be a family of open unbounded intervals in R. 


Exercise 3.5.2. [Used in Section 3.5 and Exercise 4.4.19.] Prove that the version of the 
Axiom of Choice that assumes pairwise disjoint sets (Axiom 3.5.2) implies the ver- 
sion of the Axiom of Choice that does not make such an assumption (Theorem 3.5.3). 
The idea of the proof is that if we are given a non-empty set J, and a (not necessarily 
pairwise disjoint) family of sets {B i}; <, indexed by J, we can form a new family of 


sets 1D defined by Dj = {(x, j) |x € Bj} for all j EJ. 
Exercise 3.5.3. Let P = {{1}, {2}, {1,2}, {2,3}, {1,2,3}}. List all the chains in 2. 


Exercise 3.5.4. Let ? be a non-empty family of sets, and let C be a non-empty chain 
in 2. Suppose that C # @ for all C € C. Is (ce C always non-empty? Give a proof 
or a counterexample. 


Exercise 3.5.5. Let 2 and Q be non-empty families of sets, and let CC Pand DC Q 
be chains. Is C x D always a chain in P x Q? Give a proof or a counterexample. 


Exercise 3.5.6. Let 2 be a non-empty family of sets, let / be a non-empty set and let 
{Ghic, be a family of chains in P. 


(1) Is (<7 G always a chain in 2? Give a proof or a counterexample. 
(2) Is Uje, G always a chain in 2? Give a proof or a counterexample. 


Exercise 3.5.7. [Used in Section 3.5.] Let 2 be a non-empty family of sets, let CC P 
be a chain and let A € ®. The set A is an upper bound of C if X CA for all X € C. 
The set A is a least upper bound of C if it is an upper bound of C, and A C Z for any 
other upper bound Z of C. 


(1) Suppose that C has a least upper bound in ?. Prove that the least upper bound 
is unique. 

(2) Suppose that UcecC € #. Must it be the case that Uce-C is the least upper 
bound of C? Give a proof or a counterexample. 

(3) Suppose that C has a least upper bound in 2. Must it be the case that the least 
upper bound equals Ucce-C? Give a proof or a counterexample. 
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(4) Give an example of a non-empty family Q of subsets of R, and a chain DC Q, 
such that D has an upper bound in Q but not a least upper bound. 


4 


Functions 


A function is the abstract image of the dependence of one magnitude on 
another. 
— A.D. Aleksandrov (1912-1999) 


4.1 Functions 


The reader has encountered functions repeatedly in previous mathematics courses. In 
high school one learns about polynomial, exponential, logarithmic and trigonomet- 
ric functions, among others. Although logarithms and trigonometry are often first 
encountered without thinking about functions (for example, sines and cosines are 
thought of in terms of solving right triangles), in calculus courses and above the 
focus shifts to the point of view of functions (for example, thinking of sine and co- 
sine as functions defined on the entire real number line). The operation of taking 
a derivative, for example, is something that is done to functions. In applications of 
calculus, such as in physics or chemistry, it is crucial to think of exponentials, sines 
and cosines as functions. For example, sine and cosine functions are used to describe 
simple harmonic motion. 

In modern mathematics, where we make use of set theory, functions play an even 
more important role than in calculus. For example, if we want to compare two sets to 
see if they have the same size (as discussed in Section 6.5), we use functions between 
the sets. In group theory, if we want to show that two groups are essentially the same, 
we use certain types of functions between groups, as discussed briefly in Section 7.3. 
The same idea holds in many other branches of modern mathematics. 

But what is a function really? We all have an intuitive idea of what a function is, 
usually something of the form f(x) = x*. However, a function need not be described 
by a formula, nor need it deal with numbers at all. For example, we can form the 
function that assigns to each person her biological mother; there is certainly no nu- 
merical formula that describes this function. We can think of a function informally as 
a machine, where the input is placed in an opening at the top, and for each object that 
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is put in, the machine spits a corresponding object out. See Figure 4.1.1. For exam- 
ple, if a function is given by the formula f(x) = x”, then the machine takes numbers 
as input, and if we put a = 5 into the machine, then it will spit out f(a) = 25. 


| 
= 


Fig. 4.1.1. 


You may have seen a definition of functions that looks something like “a function 
is a rule of assignment that assigns to each member of one set a unique member of 
another set.” Such a definition is often given in introductory calculus classes, and 
there is nothing blatantly incorrect about it, but it does not really say anything either. 
What is a “rule of assignment?” Well, it is a function—but then we are going in 
circles. 

To get out of the above predicament, we give a definition of functions in terms 
of sets. This rigorous definition will be seen to fit our intuitive picture of functions 
quite nicely; we cannot formally prove that this definition is identical to our intuitive 
notion, because formal proofs cannot be applied to informal concepts. To get a feel 
for our definition, given below, let us consider the function that assigns to each person 
her biological mother. Although there is no numerical formula that describes this 
function, we can, however, completely specify this function in a different way, which 
is by means of a two-column list, where on the left-hand side we list all the people 
in the world, and on the right-hand side we list each person’s mother. Part of this list 


would be 
person | person’s mother 


Fred Smith Mary Smith 
Susan Levy | Miriam Cohen 
Joe al-Haddad | Maryam Mansur 


Even for functions that are described by nice formulas, we can also think of them 
as given by lists. Consider the function defined by the formula f(x) = x? for all 
integers x. We can make a list for this function, part of which would be 
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E 
be 
No 


| 
ANF rK oO 
tate | 


Of course, the list for this function is infinite, so we cannot physically write it all 
down, but in principle such a list could be made. 

By thinking of functions as lists, we have a uniform way of treating all functions, 
whether given by formulas or not. To make this approach more compatible with set 
theory, we make one modification. Instead of using a list consisting of two columns, 
we could use a one-column list, where each entry in the new list is an ordered pair 
representing the corresponding row of the original two-column list. So, for the func- 
tion defined by f(x) =x? for all integers x, we have an infinite list of pairs, containing 
(2,4), (—2,4), (5,25), and so on. For any given integer c, there will be one and only 
one ordered pair in this list that has c in its left-hand slot, namely, the pair (c,c?). 
On the other hand, the number c” appears in the right-hand slot of two pairs, which 
are (c,c”) and (—c,c”), unless c = 0. In fact, once we have the idea of representing a 
function by ordered pairs, we do not need to think of the collection of ordered pairs 
as being written in a list, but rather, we simply think of it as a set of ordered pairs, as 
we now see formally in the following definition. 


Definition 4.1.1. Let A and B be sets. A function (also called a map) f from A to B, 
denoted f: A — B, is a subset F CA x B such that for each a € A, there is one and 
only one pair in F of the form (a,b). The set A is called the domain of f and the set 
B is called the codomain of /. A 


Definition 4.1.1 is stated entirely in terms of sets, which shows that once we 
accept set theory as the basis of mathematics, then the use of functions requires no 
additional hypotheses. 

It is important to observe that a function consists of three things: a domain, a 
codomain, and a subset of the product of the domain and the codomain satisfying a 
certain condition. Indeed, one way of defining a function is as a triple of sets (A,B, F) 
where F is a subset of A x B that satisfies the conditions given in Definition 4.1.1. 
However, we avoid writing this cumbersome triple notation by observing that in 
Definition 4.1.1 every function is defined as being from a set A to a set B, denoted 
f: A — B, and therefore the domain and the codomain of a function are always 
specified in the definition of the function. Hence, to define a function properly, it 
is necessary to say “let A and B be sets, and let f: A — B be a function.” We will 
sometimes be more concise and just say “let f: A — B be a function,” where it is 
understood from the notation that A and B are sets. It will not suffice, however, to 
write only “let f be a function” without specifying the domain and codomain, unless 
the domain and codomain are known from the context. 
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The need to specify the domain and codomain of a function when defining a 
function is not a mere formality, but a necessity when treating functions rigorously. 
For example, consider the set F = {(n,n”) |n € Z}. The set F is a subset of Z x Z 
that satisfies the conditions given in Definition 4.1.1, and hence F can be thought 
of as defining a function Z — Z. However, the set F is also a subset of Z x R that 
satisfies the conditions in the definition of a function, and hence F can be thought 
of as defining a function Z — R. Such ambiguity is not acceptable when we use 
functions in rigorous proofs, and so the domain and codomain of a function must be 
specified as part of the definition of a function. 


Example 4.1.2. 


(1) Let A and B be sets. A function from A to B is a subset of A x B. When the sets 
A and B are finite, rather than thinking of such a subset of A x B in terms of ordered 
pairs of the form (a,b), where a € A and b € B, we can think of the subset graphically 
in terms of a diagram with the sets A and B and arrows from certain elements of A to 
certain elements of B, where there is an arrow from a to b when (a,b) is in the subset. 
For example, let A = {a,b,c,d} and B = {1,2,3,4}. Two diagrams with arrows from 
A to B are seen in Figure 4.1.2. In Part (i) of the figure the diagram corresponds to the 
subset {(a,2),(b, 1), (c,4),(d,4)} CA x B, and this subset is a function; in Part (ii) 
of the figure, the corresponding subset of A x B is {(a, 1), (a,2),(b,3), (c,4)}, and it 
is not a function. 


A B A B 
a 1 a 1 
b 2 b 2 
c 3 c 3 
d 4 d 4 


(i) (ii) 


Fig. 4.1.2. 


(2) A “rule of assignment” is given by assigning to each person her sister. Is 
this rule a function? The answer depends upon the choice of domain and codomain, 
which we have been sloppy in not stating. If the domain is all people, then we cer- 
tainly do not have a function, because not everyone has a sister. Even if we restrict 
the domain to all people with sisters there is a problem, because some people have 
more than one sister, and we do not know which sister is being assigned. Therefore 
we need to restrict the domain even further to all people with precisely one sister. 
As for the codomain, it needs to be a set that contains at least all those women who 
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have siblings, and it could be any choice of such a set (different choices give rise to 
different functions). 

(3) Consider the formula f(x) = Vx? —5x+6. On its own, this formula does not 
properly define a function, because we are not given a domain and codomain. It is 
standard, however, when given a formula such as this to take as its domain the largest 
subset of R that can serve as a domain; in this case the set (—c°,2] U [3, 0) is taken 
as the domain. The codomain might as well be taken to be R, though various subsets 
of R could be taken as the codomain as well, for example [—17,°°). © 


We defined functions in terms of sets in Definition 4.1.1, but we can in fact re- 
cover the intuitive “rule of assignment” approach to functions. Let f: A — B be a 
function. Then for each a € A there is one and only one pair of the form (a,b) in the 
subset F C A x B that defines the function. In other words, for each a € A there is a 
unique corresponding b € B, where this b is the unique element of B such that the 
pair (a,b) is in F. We could then define the term “f(a)” (which was not mentioned 
in our definition of functions) to be f(a) = b, where b is as just stated. Hence our for- 
mal definition of functions leads to the more usual notation for functions, and so we 
can now revert to using the more usual notation, though with one important caveat, 
which we now state. 

The use of the “(f(x)” notation, though legitimate when used properly, often leads 
to a very common mistake. It is customary in elementary courses (such as calculus) 
to write phrases such as “let f(x) be a function.” Such a phrase, however, is not 
technically valid. If f: A — B is a function, then the name of the function is “f,” not 
“f(x).” The notation “ f(x)” means the value of the function f at the element x in the 
domain; therefore f(x) is an element of the codomain B, rather than the name of the 
function. 

It is often mistakenly thought that ““f(x)” is the name of the function because «x is 
a “variable,” rather than a specific element of the domain. In reality, however, there 
is no such thing as a variable in a function. It would be commonly understood that 
the notation “f(c)” denotes the value of the function f at the element c in the do- 
main, and so f(c) is an element of the codomain. Why should “f(x)” mean anything 
different from “f(c),’ except that c is one choice of element in the domain, and x is 
another such element? Historically, following Descartes, mathematicians have often 
used letters such as x, y and z to denote “variables,” and letters such as a, b and c 
to denote “constants,” but from a rigorous standpoint there is no such distinction. 
In careful mathematical writing, we always use the notation f to denote the name 
of the function, and the notation f(x) to denote an element of the codomain. This 
distinction between f and f(x) might seem to be an overly picky technicality, but it 
is in fact nothing of the sort. A careless approach in this matter can lead to definite 
misunderstandings in some tricky situations, such as in Section 4.5. 

The proper way to define a function is to state its domain and its codomain, and 
to state what the function “does” to each element of the domain (which is really 
the same as defining an appropriate subset of the product of the domain and the 
codomain). For example, we might write “let f: IR — R be defined by f(x) = cosx 
for all x € R.” The phrase “for all x € R” is crucial, and the definition would not be 
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correct without it. All the more so, simply stating “let f(x) = cosx” does not define 
a function. A proper definition of a function based upon a formula must include both 
the domain and codomain of the function, and it must quantify the “variable” in the 
formula. We cannot assume that x “ranges over the whole domain” just because it 
is the letter x. We need the quantifier to tell us which elements of the domain are 
treated by the formula. Hence the entire statement of the definition of f given above 
is necessary. 

Having just said that it is not correct to present a function by simply writing a 
formula, there are some situations in which presentations of functions by formulas 
are considered acceptable. If, in a given context, the domain and codomain can be 
plausibly guessed, then giving a formula can be sufficient. For example, in an intro- 
ductory calculus class, we might be given a formula such as f(x) = Vx2 —5x+6. 
Because the functions considered in introductory calculus virtually all have domains 
and codomains that are subsets of IR, we could follow the standard practice, as in 
Example 4.1.2 (3), and take (—c0,2] U[3,°¢) as the domain, and R as the codomain. 
However, because we now wish to attain a higher level of rigor than is found in 
more elementary mathematics courses, it is usually best to avoid all such informal 
conventions concerning definitions of functions, and give truly proper definitions, as 
discussed in the previous paragraph. 

Not all functions, even with domain and codomain equal to R, can be defined by 
a numerical formula. Even when a function is defined by a formula, it is not always 
possible to use a single formula, and sometimes the formula must be given in cases. 
Consider, for example, the function f: R — R defined by 


ifx>0 
—l, ifx<0. 


In general, a function can be presented by breaking up the domain as the union of two 
or more pairwise disjoint subsets, and defining the function on each of the subsets. 
To see the subsets used for the above function f, we mention that a more proper, 
though less pleasant and less commonly used, way to write the above formula would 
be 

iO. 
—1, ifx € (—c,0). 


Whereas the most sensible way to break up the domain of a function is into 
pairwise disjoint subsets, it is sometimes more convenient to break up the domain 
into subsets that are not disjoint. For example, we might define a function g: R— R 
by 
x, ifx >3 


B=) 6 itre3, 


In contrast to the case in which the domain is broken up into pairwise disjoint sub- 
sets, where there is nothing to check, in this case we must verify that the formulas 
for the two subsets agree when evaluated at the element common to both subsets 
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(which is x = 3). Everything works out fine, because 32 = 9 and 3+6 =9, and so 
the way we presented g makes sense. This situation is usually expressed by saying 
that the function g is well-defined. On the other hand, if a function is presented with 
overlapping subsets, and if the formulas do not agree on the overlap, then we do not 
have a function at all. 

The concept of a function being well-defined has a more general meaning than 
what we stated above. In general, a function is said to be well-defined if there is 
some potential problem with the definition of the function, and it turns out that the 
problem does not in fact occur. In practice, saying that a function is well-defined 
usually means that one of two things has been verified: that every element of the 
domain is indeed taken into the codomain, or that the function has only one value 
for every element of the domain. Our use of the term well-defined in the previous 
paragraph is of the second type. An example of the first type of use of the term 
well-defined occurs in Exercise 4.4.8. 

A look at the special case of functions R — R can help us gain some insight into 
functions generally. Let f: IR — R be a function. Then f gives rise to a graph in 
IR*, where the graph consists of all points in R* of the form (x, f(x)), where x € R. 
For each such x, the definition of functions implies that there is one and only one 
corresponding value f(x) € IR. Hence, for each x € R there is one and only one point 
on the graph of f that is on the vertical line through x. See Figure 4.1.3. Conversely, 
suppose that we are given a curve in R?. Is this curve necessarily the graph of some 
function g: IR — R? If the curve has the property that it intersects each vertical line 
in the plane at precisely one point, then the curve will be the graph of some function 
g: R—R,; if this property does not hold, then the curve will not be the graph of such 
a function. 


Fig. 4.1.3. 


As we noted earlier, a function consists of three things: a domain, a codomain 
and a subset of the product of the domain and the codomain satisfying a certain 
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condition. For two functions to be considered equal, they need to have all three of 
these things be the same. If even one of these three things is changed, a different 
function is obtained. For example, the function f: R — R defined by f(x) =x?+1 
for all x € R is not the same function as g: R — (0,00) defined by g(x) =x?+ 1 for 
all x € R, even though they both have the same formula and the same domain. 

Let f: A — Band g: C — D be functions. To say that “f = g” means that A = C, 
that B = D and that the two functions correspond to the same subset of A x B. This 
last statement can be rephrased by saying that f(x) = g(x) for all x € A. Observe that 
the statement “f(x) = g(x) for all x € A” is not a statement about equivalent formulas 
for f and g, because the functions f and g might not be given by formulas at all, but 
is rather a statement about the equality of various elements in the codomain. That is, 
a single statement about functions, namely, the statement f = g, is equivalent to a 
collection of statements about elements in the codomain (once it is ascertained that 
the two functions have the same domain and codomain). A proof that f and g are 
equal typically has the following form. 


Proof. (Argumentation) 
Therefore the domain of f is the same as the domain of g. 
(argumentation) 


Therefore the codomain of f is the same as the codomain of g. 
Let a be in the domain of f and g. 


(argumentation) 


Then fla) ala). 
Therefore f = g. 


There are some particularly useful types of functions that are encountered through- 
out mathematics. 


Definition 4.1.3. Let A and B be sets, and let §S C A be a subset. 


1. A constant map f: A — Bis any function of the form f(x) = b for all x € A, 
where b € B is some fixed element. 

2. The identity map on A is the function 14: A > A defined by 14 (x) = x for 
allx EA. 

3. The inclusion map from S to A is the function j: S — A defined by j(x) =x 
for all x € S. 

4. If f: A — Bisa function, the restriction of f to S, denoted f 
tion f|s: S— B defined by f|s(x) = f(x) for allx ES. 


s, iS the func- 
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5. If g: S — B is a function, an extension of g to A is any function G: A — B 
such that Gls = g. 

6. The projection maps from A x B are the functions 7,: Ax B — A and 
™: A x B > B defined by 7 ((a,b)) =a and m((a,b)) = b for all (a,b) € 
A x B. For any finite collection of sets A;,...,A,, projection maps 


Tj: Al oy xAp = Aj 
for alli € {1,...,p} can be defined similarly. A 
Example 4.1.4. 


(1) Let f: R — R be defined by f(x) = sinx for all x € R. Then the restriction 
of f to Q is the function f|g: Q — R defined by f|q(x) = sinx for all x € Q. 

(2) Let X = {a,b,c}, let ¥Y = {a,b} and let Z = {1,2,3}. Let f: Y — Z be defined 
by f(a) =3 and f(b) =2, and let g,h: X — Z be defined by g(a) = 3, and g(b) =2, 
and g(c) = 1, and h(a) = 3, and h(b) = 1, and h(c) = 2. Then g is an extension of f, 
because gly = f, but / is not an extension of f. There are other possible extensions 
of f. 

(3) We can think of R? as R x R. We then have the two projection maps 7: R? > 
R and 7: R? — R that are defined by 7 ((x,y)) =x and m((x,y)) =y for all (x,y) € 
R?. That is, the projection map 7 picks out the first coordinate of the point (x,y), 
for all (x,y) € R?, and similarly for 2p. 


In addition to the general types of functions given in Definition 4.1.3, which 
we will use throughout this text, we will also make use of some standard functions 
IR — R, such as polynomials, exponentials, logarithms and trigonometric functions 
in various examples. It is beyond the scope of this text to define these standard func- 
tions, but they can indeed be defined rigorously, and all the familiar properties can 
be proved, so no harm is done in our using these functions. See [Blol1, Chapter 7] 
for rigorous definitions of such functions. 

We conclude this section with a brief comment about the Axiom of Choice, which 
was first discussed in Section 3.5. In that section we did not have functions at our 
disposal, and hence our statement of the axiom in Theorem 3.5.3 was in terms of 
families of sets. However, it is much more natural, and convenient, to use functions 
to state the Axiom of Choice, which we now do. We do not need to prove this new 
version of the Axiom of Choice, because it is simply a restatement of Theorem 3.5.3. 


Theorem 4.1.5 (Axiom of Choice—Functions Version). Let I be a non-empty set, 
and let {Aj} ic, be a family of non-empty sets indexed by I. Then there is a function 
f: I UjerAi such that f(i) € A; for alli € 1. 


The function given in Theorem 4.1.5 is called a choice function for {Aj}j<;. It 
is also possible to formulate a non-indexed version of the Axiom of Choice using 
functions, which we leave to the reader in Exercise 4.1.9. 


Exercises 


Exercise 4.1.1. Let A = {a,b,c} and B = {1,2,3}. Which of the following subsets 
of A x B are functions A — B? 
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(1) {(6,1), (¢,2),(4,3)}. (4) {(a,1),(b,3)}. 
(2) {(4,3),(¢,2),(4, 1}. (5) {(¢,1),(4,2),(b,3), (¢,2)}- 
(3) {(c, 1), (b, 1), (a,2)}- (6) {(4,3), (c,3),(b,3)}- 


Exercise 4.1.2. Let X denote the set of all people. Which of the following descrip- 
tions define functions X — X? 


(1) f(a) is the mother of a. 

(2) g(a) is a brother of a. 

(3) h(a) is the best friend of a. 

(4) k(a) is the firstborn child of a if she is a parent, and is the father of a other- 
wise. 

(5) j(a) is the sibling of a if she has siblings, and is a otherwise. 


Exercise 4.1.3. Which of the diagrams in Figure 4.1.4 represent functions? 


a) Ot 
RE 


cone 


(ili) (iv) 


Fig. 4.1.4. 


Exercise 4.1.4. Which of the following descriptions properly describe functions? 


(1) Let f(x) =cosx. 
(2) To every person a, let g(a) be the height of a in inches. 
(3) For every real number, assign the real number that is the logarithm of the 


original number. 
(4) Let g: R—R be defined by g(x) =e’. 


Exercise 4.1.5. Which of the following formulas define functions R — R? 
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(1) f(x) =sinx for all x ER. x itx>1 
(65) sx)=4 5 
243 x, ifx<0. 
(2) p(x) = +45 forallx eR. 
; =k. eo 
4 (6) t(x) = : 
(3) g(x) =In(x* + 1) for allx ER. |x|, ifx< 1. 


r ‘ _ jsinx, ifx>a 
(4) r(x) = \ et y= i 


: x ifx< 7. 
cosx, ifx<0. : 


Exercise 4.1.6. For each of the following formulas, find the largest subset X C R 
such that g: X — R is a function. 


ifx€ X andx>0 


(1) g(x) = ga ee 

(2) g(x) = < 1—x? for allx EX. 

(3) g(x) = a (sinx) for all x € X. 
(x) = 


4 
(4) 8 a ifx eX andx <0. 


tanax+4, ifxe xX andx>1 
3x2 +1, ifxeXandx<l. 


rT | 


(5) g(x 


Exercise 4.1.7. Let A and B be sets, let S C A be a subset and let f: A —- B be a 
function. Let g: A — B be an extension of f|s; to A. Does g equal f? Give a proof or 
a counterexample. 


Exercise 4.1.8. [Used in Theorem 4.5.4 and Section 8.7.] Let X be a non-empty set, 
and let § C X be a subset. The characteristic map for S in X, denoted 7s, is the 
function 75: X — {0,1} defined by 


l, ifyes 
as)={ ifyeXx—S. 


Let A,B C X be subsets. Prove that 74 = 7p if and only if A = B. (Observe that 
“YA = Xp” is a Statement of equality of functions, whereas “A = B” is a statement of 
equality of sets.) 


Exercise 4.1.9. [Used in Section 4.1.] Restate Theorem 4.1.5 in a non-indexed ver- 
sion. 


Exercise 4.1.10. [Used in Exercise 4.1.11.] Let A and B be sets. A partial function 
from A to B is a function of the form f7: J — B, where J C A. We can think of partial 
functions from A to B as subsets of A x B that satisfy a certain condition. 

Let f; and gx be partial functions from A to B. Prove that f; C gx if and only if 
J CK and gx|y = fy. 


Exercise 4.1.11. [Used in Section 3.5.] The purpose of this exercise is to prove that 
Zorn’s Lemma (Theorem 3.5.6) implies the Axiom of Choice. Given that we used 
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the latter in the proof of the former, it will follow that the two results are equivalent. 
We make use here of the version of the Axiom of Choice stated in Theorem 4.1.5. 
Let / be a non-empty set, and let {A;},., be a family of non-empty sets indexed by 


I. Assume Zorn’s Lemma. We will show that there is a choice function for {A;},<;. 


(1) A partial choice function for {A;};-; is a function fy: J + Uj<,Aj for some 
J CI such that f7(j) € A; for all j € J. If f; is a partial choice function for 
{Ai} icy, we can think of f7 as a subset of J x U jeyAj CI x Vier Ai- 

(2) Let # be the set of all partial choice functions for {A;},.;, and let C be a chain 
in P. Prove that Uce-C is in P. [Use Exercise 4.1.10.] 

(3) By Zorn’s Lemma the family of sets P has a maximal element. Let fx € ? be 
such a maximal element. Prove that K = J. (Recall that the Axiom of Choice 
is not needed to choose an element from a single non-empty set.) 

(4) Deduce Theorem 4.1.5. 
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Let A denote the set of all adults (defined for example to be people 18 years 
and older), and let h: A — R be defined by letting h(x) be the height in inches 
of person x. There are a number of things we might want to do with this func- 
tion. For example, we might want to find the various heights found among all 
adults living in France. This set of heights would be written in set notation as 
{h(x) | x lives in France}. Alternatively, and more useful to us, we could write this 
set as {r € R| r = A(x) for some x who lives in France}. What we are doing here is 
taking a subset of the domain, namely, all adults living in France, and finding the 
corresponding subset of the codomain, namely, all possible real numbers that arise 
as the heights of adults in France. 

We might also want to find all the adults whose heights are at least 6 ft. and 
no more than 6 ft. 3 in. Because we are working in inches, we therefore want to 
find all people whose heights are in the interval [72,75]. Hence, we want the set 
{x €A | h(x) € [72,75] }. In this case we are taking a subset of the codomain, namely, 
a certain set of possible heights, and finding the corresponding subset of the domain, 
namely, all people whose heights are as desired. 

The following definition generalizes the above process. Given a function f: A > 
B, we want to take each subset P of A, and see where f sends all of its elements 
(which will give us a subset of B), and we want to take each subset Q of B, and see 
which elements of A are mapped into it by f (which will give us a subset of A). 


Definition 4.2.1. Let A and B be sets, and let f: A — B be a function. 
1. Let P CA. The image of P under f, denoted f(P), is the set defined by 


f(P) ={b © B| b= f(p) for some p € P}. 


The range of f (also called the image of f) is the set f(A). 
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2. Let Q C B. The inverse image of Q under f, denoted f~'(Q), is the set 
defined by 


f-'(Q) = {a A| f(a) € Qh. A 
See Figure 4.2.1 for a schematic drawing of f(P) and f~'(Q). 


OMG 
y \e 


Fig. 4.2.1. 


Example 4.2.2. Let f: R — R be defined by f(x) = x* — 6x for all x € R. It is 
straightforward to compute that f([6,7]) = [0,7], that f~!([0,4]) = [3 — V13,0] U 
[6,3 + 13], that f~'([—12, —10]) = @ and that the range of the function is [—9,e); 
the details are left to the reader (it helps to graph the function). » 


In Part (1) of Definition 4.2.1 it would have been possible to have written 


F(P) =4{f(p) |p © Ph, 


and in Part (2) of the definition it would have been possible to have written 


f-\(Q) ={a€A| fa) =¢ for some q € Q}. 


However, the method of defining these sets given in Definition 4.2.1 will be more 
useful to us than these alternatives. 

The terms “range” and “codomain” are often confused, so precise use of language 
is needed. 

The notations “f(P)” and “f—!(Q)” are widely used, and so we will use them too, 
but they need some clarification. The notation “f(P)” is not formally meaningful, 
because only elements of the domain (not subsets of it) can be substituted into /. 
Writing f(P) is an example of what mathematicians refer to as “abuse of notation,” 
which means a way of writing something that is technically incorrect, but which is 
convenient to use and which causes no problems. 

Unfortunately, it cannot be said that the notation “f—!(Q)” causes no problems. 
We urge the reader to use this notation with caution, for the following reason. Later 
in this chapter, we will discuss the notion of an inverse function (in Definition 4.3.6). 
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If a function f: A — B has an inverse function (which is not true for all functions), 
then the inverse function is denoted f~!. Even though this latter notation is very 
similar to the notation f~!(Q), the concept of an inverse image of a set and the 
concept of an inverse function are quite different, and it is the similarity of notation 
for different concepts that is the source of the problem. The inverse image f—!(Q) is 
a subset of the domain of f, and is always defined for any function f and any subset 
Q of the codomain. By contrast, the inverse function f—! does not always exist; if 
it does exist, then it is a function B — A, not a subset of A. It is very important to 
keep in mind that the notation f~'(Q) does not necessarily mean the image of the 
set Q under f~!, because f~!(Q) is used even in cases where the function f—! does 
not exist. Proofs about sets of the form f~!(Q) should not make use of an inverse 
function f—! unless there is a specific reason to assume that f~!exists. 

The following example should further demonstrate that the notation f—!(D) 
should be used with caution, because in this context the notations “f” and “f—!” 
do not necessarily “cancel each other out,’ as might be mistakenly assumed. 


Example 4.2.3. Let h: R — R be defined by h(x) =x for all x € R. It is straightfor- 
ward to compute that h([0,3]) = [0,9] and A([—2,2]) = [0,4]. Then h~! (h([0,3])) = 
h—'((0,9]) = [—3,3]. We therefore see that h—!(h({0,3])) 4 [0,3]. Similarly, we 
compute that h(h~!({—4,4])) = A([—2,2]) = [0,4], and hence h(h~!([—4,4])) 4 
[-4,4]. 0 


For the proof of the following theorem, as well as for subsequent results involving 
images and inverse images, we need two observations about proof strategies. First, 
suppose that we wish to prove that either of f(P) or f~!(Q) is equal to some other 
set. Though we are dealing with functions, we observe that objects of the form f(P) 
or f—'(Q) are sets, and to prove that they are equal to other sets (which is the only 
sort of thing to which they could be equal), we use the standard strategy for proving 
equality of sets, which is showing that each set is a subset of the other. 

Second, we mention that statements of the form “x € f(P)” and “z € f~!(Q)” 
are difficult to work with directly, and it is usually easier if we first transform such 
statements into equivalent ones that do not involve images and inverse images. More 
specifically, suppose that we start with a statement of the form “x € f(P).” The defi- 
nition of f(P) then allows us to rewrite the statement as “x = f(a) for some a € P,” 
and we observe that this latter statement does not involve the image of a set, making 
it easier to work with than the original statement. Conversely, a statement of the form 
“x = f(a) for some a € P” can be rewritten as “x € f(P).” Similarly, suppose that we 
start with a statement of the form “z € f~!(Q).” The definition of f~!(Q) allows us 
to rewrite the statement as “f(z) € Q,” which again is easier to work with than the 
original statement. Conversely, a statement of the form “f(z) € Q” can be rewritten 
as “z € f-!(Q).” As is the case with many problems in mathematics, going back to 
the definitions is often the best way to start creating a proof. 

The following theorem, the proof of which uses the above mentioned strategies, 
gives some of the most basic properties of images and inverse images. Observe in 
Part (7) of the theorem that images are not quite as well behaved as inverse images. 
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Theorem 4.2.4. Let A and B be sets, letC,D CA and S,T CB be subsets, and let 
f: A— B be a function. Let I and K be non-empty sets, let {Uj} jc, be a family of 
subsets of A indexed by I, and let {Vi} ,ax be a family of subsets of B indexed by K. 


. f(0) =Oand f-'(0) = 90. 

ff (B)=A. 

. f(C) CS if and only ifC C f—"(S). 
. IfC CD, then f(C) C f(D). 

. HSCT, then f(S) C7). 

. f (Vier Ui) = Vier f(Ui). 

. f (ier Ui) c Nier f(Ui). 

Ff! (Ukex Vi) = Uke £7" (Ve)- 

oe ee i) = Nex f~' (Vi). 


Proof. We will prove Parts (5) and (6), leaving the rest to the reader in Exercise 4.2.6. 


(5). Suppose that S C T. Let x € f~!(S). Then by definition f(x) € S. Because 
SCT, it follows that f(x) € T. Hence x € f~!(T). We deduce that f—!(S) C f—!(T). 


(6). First, let b € f (Uje,Ui). Then b = f(u) for some u € UjeU;. Therefore 
u € U; for some j € J. Hence b € f(Uj) C Ujer f(Ui). It follows that f (Uje7 Ui) C 
Uier f(Ui). Next, let a € Ue; f(U;). Then a € f(U,) for some k € J. Hence a = 
f(v) for some v € Ux. Because v € Uje7Ui, it follows that a € f (Uje, Ui). Therefore 
Vier f(Ui) S f (Vier Ui). We conclude that f (Ujer Ui) = Uier f (Ui). 


We conclude our discussion of images and inverse images with a slightly more 
abstract approach to the subject. Let f: A > B be a function. If P C A, then f(P) CB. 
That is, for every element P € P(A), we obtain an element f(P) € 2(B). Hence, the 
process of taking images of subsets of A amounts to the fact that the function f 
induces a new function f,.: 2(A) — P(B), which is defined by f.(P) = f(P) for all 
P € P(A). Similarly, for every element Q € #(B), we obtain an element f~'(Q) € 
(A), and hence, the process of taking inverse images of subsets of B amounts to the 
fact that the function f induces a new function f*: @(B) — ®(A), which is defined 
by f*(P) = f-!(Q) for all Q € P(B). From that point of view, the terms “image” and 
“fnverse image” are redundant. For example, the notation “f,.(P)” simply means the 
result of applying the function f,. to the element P in the domain of f.., and hence we 
do not need to call “f.(P)” by the special name “the image of P under f.’ However, 
although this more abstract point of view is a technically correct way to think of 
images and inverse images, it is usually more useful in the course of formulating 
proofs to think of images and inverse images as we have done until now, and so we 
will not be making use of this more abstract approach other than in a few exercises. 


SON AWAWN” 


Exercises 


Exercise 4.2.1. Find the range of each of the following functions. 


(1) Let f: R— R be defined by f(x) =x° —5 forall x ER. 
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(2) Let g: R— R be defined by g(x) =x? —x* forallxE R. 

(3) Let h: R — (0,0) be defined by h(x) = e*!+3 for allx ER. 

(4) Let p: R— R be defined by p(x) = Vx*+5 for allx ER. 

(5) Let g: R — [—10, 10] be defined by g(x) = sinx +cos.x for all x € R. 


Exercise 4.2.2. Let C be the set of all cows in the world. Let m: C — R be the 
function defined by letting m(c) equal the average daily milk production in gallons 
of cow c. Describe in words each of the following sets. 


(1) m({Bessie, Bossie}). 
(2) m(F), where F denotes all the cows in India. 
(3) m~'((1,3]). 
(4) m=" ([-5,3}). 
(5) m~'({0}). 
Exercise 4.2.3. For each of the following functions f: IR — R and each set T CR, 
find f(T), f(T), f(f-'(T)) and f-' (f(T). 
(1) Let f: R— R be defined by f(x) = (x+1)? for all x € R, and let T = [—1, 1]. 
(2) Let f: R — R be defined by f(x) = (x+1)? for all x € R, and let T = [—5,2]. 
(3) Let f: RR be defined by f(x) = [x] for all x € R, where [x] is the smallest 
integer greater than or equal to x, and let T = (1,3). 
(4) Let f: R— R be defined by f(x) = |x| for all x € R, where |.x| is the greatest 
integer less than or equal to x, and let T = [0,2] U(5,7). 


Exercise 4.2.4. Let g: R* — R be defined by g((x,y)) = xy for all (x,y) € R. Sketch 
each of the following subsets of R?. 


x- 
x- 


(1) g@!({3}). (2) g“'({-1,1)). 


Exercise 4.2.5. Let X and Y be sets, let A C X and B CY be subsets and let 
™:X x Y —X and m: X x Y —Y be projection maps as defined in Section 4.1. 


(1) Prove that (;)~'(A) =A x Y and (m)~'(B) =X x B. 
(2) Prove that (2;)~'(A)N(m)~'(B) =AxB. 
(3) Let PC X x Y. Does 1 (P) x %2(P) = P? Give a proof or a counterexample. 


Exercise 4.2.6. [Used in Theorem 4.2.4.] Prove Theorem 4.2.4 (1) (2) (3) (4) (7) (8) 
(9). 


Exercise 4.2.7. Find the flaw(s) in the following alleged proof of Theorem 4.2.4 (8), 
assuming that Parts (1)—(7) have already been proved: “Applying f to f~! (Ukex Ve) 
we obtain f (f~! (Ugex Ve)) = Urex Ve: Applying f to Usex f1(Vi), and using 
Part (6) of the theorem, we obtain f (Uex f' (Vi) = Ukex f(f | (Vi) = Une Ve- 
Because applying f to both sides of the equation in Part (8) yields the same result, 
we deduce that the equation in Part (8) is true.” 


Exercise 4.2.8. In this exercise we show that it is not possible to strengthen Theo- 
rem 4.2.4 (3). 
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(1) Find an example of a function f: A — B together with sets X CA andY CB 
such that {(X)=¥ and x 4 f-'(¥). 

(2) Find an example of a function g: J — K together with sets ZC J and W C K 
such that f-!(W) =Z and f(Z) 4 W. 


Exercise 4.2.9. Find an example to show that the “C” in Theorem 4.2.4 (7) cannot 
be replaced with “=.” It is sufficient to use the intersection of two sets. 


Exercise 4.2.10. 


(1) Find an example of a function f: A — B and subsets P,Q C A such that P G Q, 
but that f(P) = f(Q). 

(2) Find an example of a function g: C — D and subsets S,T C D such that 
S ST, but that g~!(S) = g7'(T). 


Exercise 4.2.11. [Used in Exercise 4.4.11.] Let A and B be sets, let P,Q C A be subsets 
and let f: A — B be a function. 


(1) Prove that f(P) — f(Q) C f(P—Q). 
(2) Is it necessarily the case that f(P — Q) C f(P) — f(Q)? Give a proof or a 
counterexample. 


Exercise 4.2.12. Let A and B be sets, let C,D C B be subsets and let f: A B bea 
function. Prove that f~'(D—C) = f~!(D) — f-!(C). 


Exercise 4.2.13. Let A and B be sets, let X CA and Y C B be subsets and let f: A — 
B bea function. 


(1) Prove that X C i '(f(X)). 
(2) Prove that f(f~'(Y)) CY. 
(3) Prove that X = f—!(f(X)) if and only if X = f—!(Z) for some Z C B. 
(4) Prove that Y = a ~l(Y)) if and only if ¥Y = f(W) for some W C A. 
(5) Prove that f(f~!(f(X))) =f). 

(6) Prove that f-'(f(f-'(Y))) =f-1(Y). 


Exercise 4.2.14. Let A and B be sets, and let f,g: A — B be functions. Think of these 
functions as inducing functions f,,g,: P(A) — P(B), and functions f*,g*: P(B) > 
(A). Prove that f, = g, if and only if f* = g* if and only if f = g. 


Exercise 4.2.15. [Used in Exercise 6.5.15.] Let A be a non-empty set, and let 
g: P(A) — (A) be a function. The function g is monotone if X C Y implies 
g(X) C g(Y) forall X,Y € P(A). 

Suppose that g is monotone. 


(1) Let D be a family of subsets of A. Prove that g (QyenX) C Nxeng(X). It is 
not sufficient simply to cite Theorem 4.2.4 (7), because it is not necessarily 
the case that g = f,. for some function f: A — A. 

(2) Prove that there is some T € P(A) such that g(7) = 7. Such an element T is 
called a fixed point of g. Use Part (1) of this exercise. 
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4.3 Composition and Inverse Functions 


Functions can be combined to form new functions in a variety of ways. One simple 
way of combining functions that is seen in courses such as calculus is to add or 
multiply functions R — R. Though very useful, this method of combining functions 
is not applicable to all sets, because the ability to add or multiply functions R — 
R relies upon the addition or multiplication of the real numbers, and not all sets 
have such operations. A more broadly applicable way of combining functions, also 
encountered in calculus, is seen when the Chain Rule for taking derivatives is used. 
This rule is used with functions such as f: R — R defined by f(x) = Vx? +3 for 
all x € IR, which are built up out of a function “inside” a function. The following 
definition formalizes this notion. 


Definition 4.3.1. Let A, B and C be sets, and let f: A B and g: B—C be functions. 
The composition of f and g is the function go f: A — C defined by 


(go f)(x) = g(f@)) 
for allx € A. A 


Observe that the notation “go f” in Definition 4.3.1 is the name of a single func- 
tion A — C, which we constructed out of the two functions f and g. By contrast, 
the notation “(go f)(x)” denotes a single value in the set C. It would not be correct 
to write “go f(x),” because o is an operation that combines two functions, whereas 
“f(x)” is not a function but a single element in the set B. Observe also that for the 
composition of two functions to be defined, the codomain of the first function must 
equal the domain of the second function. 

The reader who is encountering the notation go f for the first time might find it 
necessary to get used to the fact that it is “backwards” from what might be expected, 
because go f means doing f first and then g even though we generally read from left 
to right in English. Think of “o” as meaning “following.” We will stick with the “o” 
notation in spite of any slight confusion it might cause at first, because it is extremely 
widespread, and because the reader will find that it works well once she is used to it. 


Example 4.3.2. 


(1) Let P be the set of all people, and let m: P — P be the function that assigns 
to each person her mother. Then mom is the function that assigns to each person her 
maternal grandmother. 

(2) Let f,g: R — R be defined by f(x) = x? and g(x) =x-+3 for all x CR. 
Then both fog and go f are defined, and (fo g)(x) = (x+3)? for all x € R, and 
(go f)(x) =x? +3 for allx eR. 

(3) Let k: R — R be defined by k(x) = sinx for all x € R, and let h: (0,00) 
R be defined by h(x) = Inx for all x € (0,0¢). Then k oA is defined, and is given 
by (koh)(x) = sin(Inx) for all x € (0,0¢). On the other hand, we cannot form the 
composition ok, because the domain of / is not the same as the codomain of k, 
reflecting the observation that In(sinx) is not defined for all x € R. o) 
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One way to visualize the composition of functions is to use “commutative dia- 
grams.” If f: A — B and g: B — C are functions, then we can form go f: A >C, 
and we can represent all three of these functions in the following diagram. 


A 
i gof 


B z > C 


This diagram is referred to as a commutative diagram, which means that if we 
start with any element x € A, and trace what happens to it going along either of the 
two possible paths from A to C, we end up with the same result. If we go first down 
and then across, the result is g(f(x)), and if we go diagonally, the result is (go f) (x). 
Commutative diagrams (often much more complicated than the one seen above) are 
important in some branches of mathematics, for example algebraic topology. 

An example of the use of the composition of functions is coordinate functions. In 
multivariable calculus it is standard to write functions into R” in terms of coordinate 
functions, and we can now generalize this notion to arbitrary sets. 


Definition 4.3.3. Let B be a set, let A,,...,4, be sets for some n € N and let 
f: B =A, x-++xA, be a function. For each i € {1,...,n}, the i-th coordinate 
function of f, denoted fj, is the function f;: B — A; defined by f; = 7;0 f, where 
1: Aj X ++» x A, — A; is the projection map. A 


The fact that f; = 70 f for alli € {1,...,n}, as given in Definition 4.3.3, means 
that f(x) = (fi(x),.--,fn(x)) for all x € B. In some texts this fact is abbreviated 
by writing f = (f\,..-,fn), or alternatively by writing f = f; x --- x f,. How- 
ever, although the notations (fi,..., fn) and fi x --- x f, could be formally de- 
fined to be the function we have denoted f, the reader is urged to use these two 
notations with caution, or to avoid them at all, for the following reason. Whereas 
writing f(x) = (fi(x),..-,fn(x)) is perfectly sensible, the two sides of the equa- 
tion being different expressions for the same element of A; x --- x Ay, using the 
notation f = (fi,..-,fn) might mistakenly suggest that the function f is an ele- 
ment of the product of 1 sets, which is not necessarily true, and using the notation 
f=fixX-+::xX fn might mistakenly suggest that f is the product of n sets, which is 
also not necessarily true. 

Coordinate functions when n = 2 can be represented by the following commuta- 
tive diagram. Each triangle of functions in the diagram is commutative in the sense 
described previously. 


fi I i 


Aj a AD x AD ey eee.” 
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Example 4.3.4. Let f: IR* — R° be defined by 


f((x,y)) = @xy, sinx’,x+y°) 


for all (x,y) € R*. The three coordinate functions of f are f;, f2, 3: R? — R defined 
by 

A((xy)) =ay, fo((~y)) = sina, and fs((x,y)) =x +9 
for all (x,y) € R?. 0 


Which of the familiar properties of operations (for example commutativity and 
associativity) hold for the composition of functions? The Commutative Law, which 
for the real numbers and addition states that a+ b = b+ a for all a,b € R, does 
not hold for functions and composition, for two reasons. First, suppose that we have 
functions f: A — Band g: B —C, so that we can form go f. Unless it happens to be 
the case that A = C, then we could not even form fo g, and so commutativity is not 
relevant. Even in situations where we can form composition both ways, however, the 
Commutative Law does not always hold, as seen in Example 4.3.2 (2). The following 
lemma shows, however, that some nice properties do hold for composition. 


Lemma 4.3.5. Let A, B, C and D be sets, and let f: A — B and g: BC and 
h: C— D be functions. 


I. (hog)of=ho(gof) (Associative Law). 
2. fola=fand|lgof=f (Identity Law). 


Proof. 


(1). Itis seen from the definition of composition that both (hog) o f andho(go f) 
have the same domain, the set A, and the same codomain, the set D. If a € A, then 


((hog)of)(a) = (hog)(f(a)) = h(g(F(@)) 
=h((go f)(a)) = (ho(gof))(@)- 


Hence (hog)o f =ho(gof). 
(2). This part is straightforward, and is left to the reader. 


Do functions have inverses under composition? That is, for any given function 
is there another that “cancels it out” by composition? In arithmetic, for example, 
we can cancel out the number 3 by adding —3 to it, which yields 0. For functions, 
the operation addition and the number 0 are replaced with composition of functions 
and the identity map, respectively. However, the non-commutativity of composition 
means that we need a bit more care when we define “canceling out” for functions 
than we do with addition (which is commutative). 


Definition 4.3.6. Let A and B be sets, and let f: A — B and g: B — A be functions. 


1. The function g is a right inverse for f if fog = lg. 
2. The function g is a left inverse for f if go f = ly. 
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3. The function g is an inverse for f if it is both a right inverse and a left 
inverse. A 


Definition 4.3.6 (1) (2) was stated in the most concise possible way using only 
the names of the functions involved. In practice, however, it is often convenient to 
use the fact that fog = 1g means f(g(x)) =~ for all x € B, and that go f = 1, 
means g(f(x)) =x for all x € A. Also, although we used the term “an inverse” in 
Definition 4.3.6 (3), it is seen in Part (1) of the following result that we could actually 
have written “the inverse.” 


Lemma 4.3.7. Let A and B be sets, and let f : A— B be a function. 


1. If f has an inverse, then the inverse is unique. 

2. If f has a right inverse g and a left inverse h, then g =h, and hence f has an 
inverse. 

3. If g is an inverse of f, then f is an inverse of g. 


Proof. 

(1). Suppose that g,4: B— A are both inverses of f. We will show that g =h. By 
hypothesis on g and 4 we know, among other things, that fog = 1g andho f = 1y. 
Using Lemma 4.3.5 repeatedly we then have 

g=lh~og=(hof)og=ho(fog)=holg=h. 

(2). The proof is the same as in Part (1). 


(3). Suppose that g: B — A is an inverse of f. Then go f = 14 and fog= lz. 
By the definition of inverses, it follows that f is an inverse of g. 


Observe that the proof of Lemma 4.3.7 (1) is virtually identical to the proof of 
the uniqueness part of Theorem 2.5.2. The same proof in a more generalized setting 
is also used for Lemma 7.2.4. Lemma 4.3.7 (1) allows us to make the following 
definition. 


Definition 4.3.8. Let A and B be sets, and let f: A — B be a function. If f has an 
inverse, the inverse is denoted f—!: B — A. A 


It is important to keep in mind the great difference in meaning between the no- 
tation “f—!(Q)” discussed in Section 4.2 and the notation “f—!” given in Defini- 
tion 4.3.8. The notation f =a (Q) denotes a set, not a function, and it exists even if the 
function f—! does not exist. In particular, the use of the notation f~!(Q) should not 
be taken as implying that f has an inverse. 

Moreover, suppose that the inverse function f—! does exist. Then the notation 
f~!(Q) has two meanings, which are the inverse image of Q under f and the image 
of Q under f—!. The former of these meanings is the set {a € A | f(a) € Q}, and the 
latter of these meanings is the set {a € A |a = f_'(q) for some q € Q}. Fortunately, 
as the reader can verify, these two sets are equal, and so there is no ambiguity in the 
meaning of the notation f~!(Q) in those cases when f~! exists. 
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We note that if f: A > B has an inverse f~!: B — A, then f—!(f(x)) =x for all 
x €Aand f(f~!(x)) =x for all x € B. Another way of stating the relation between 
f and f—! is to say that y= f~!(x) if and only if x = f(y) for all y ¢ A and x € B. 
This latter formulation will not be particularly useful to us in the construction of 
rigorous proofs, but we mention it because the reader has likely encountered it in 
precalculus and calculus courses, for example where the natural logarithm function 
In is defined by saying that y = Inx if and only if x = e”. Moreover, we observe that 
to say y= f—!(x) if and only if x = f(y) for all y € A and x € B means that f—! is 
obtained from f by interchanging the roles of x and y, a fact that has a very important 
application if we look at the particular case where A,B C R. In that case, the graph of 
f—! can be obtained from the graph of f by reflecting the x-y plane in the line y = x, 
which precisely has the effect of interchanging the roles of x and y. See Figure 4.3.1 
for an example of such graphs. 


Fig. 4.3.1. 


As seen in the following example, some functions have neither right nor left 
inverse, some have only one but not the other, and some have both. Moreover, if a 
function has only a right inverse or a left inverse but not both, the right inverse or left 
inverse need not be unique. 


Example 4.3.9. 

(1) Let k: (0,1) — (3,5) be defined by k(x) = 2x+3 for all x € (0,1). We claim 
that k has an inverse, the function j: (3,5) — (0,1) defined by j(x) = +33 for all 
x € (3,5). We compute j(k(x)) = SS =x for all x € (0,1), and hence jok = 
10,1). Similarly, we compute k(j(x)) =2- 453. +3 =x for all x € (3,5), and hence 
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ko j =1@ 55). Therefore j is both a right inverse and a left inverse for k, and hence it 
is an inverse for k. We conclude that j = k~!. 

(2) Let f: R — [0,) be defined by f(x) = x? for all x € R. This function has 
no left inverse, but many right inverses, of which we will see two. Let g,h: [0,00) > 
R be defined by g(x) = \/x and h(x) = —,/x for all x € [0,0¢). Both g and h are 
right inverses for f, because (f 0 g)(x) = f(g(x)) = (\/x)? =x for all x € [0,), and 
(f oh)(x) = f(h(x)) = (—/x)* =x for all x € [0,00). To see that f has no left inverse, 
suppose to the contrary that f has a left inverse m: [0,0¢) — R. How should we define 
m(9)? Because mis a left inverse for f, we know that mo f = 1p. Hence m(f(x)) =x 
for all x € R. We would then need to have m(9) = m(37) = (mo f)(3) = 3, but we 
would also need to have m(9) = m((—3)) = (mo f)(—3) = —3. Therefore there is 
no possible way to define m(9), and hence m does not exist. It follows that f has no 
left inverse. (Observe that we could have used any other positive number instead of 
9.) 

(3) Let p: [0,0c) — R be defined by p(x) = x? for all x € [0,-0). Then p has no 
right inverse, but many left inverses, of which we will see two. Let g,r: R — [0,) 
be defined by 

g(x) = fe i. fe ag 
1, ifx <0, sinx, ifx <0. 
Both g and r are left inverses for p, because (qo p)(x) = q(p(x)) = Vx? =x for all 
x € (0,00), and (ro p)(x) = r(p(x)) = Vx? = x for all x € [0,00). To see that p has no 
right inverse, suppose to the contrary that p has a right inverse uw: R — [0,c0). How 
should we define u(—4)? Because u is a right inverse for p, we know that pou = lp. 
Hence p(u(x)) =x for all x € R. Therefore (u(x))* =x for all x € R. Hence we would 
need to have (u(—4))* = —4, which is impossible, because u(—4) is a real number, 
and no real number squared is negative. Therefore there is no possible way to define 
u(—4), and hence u does not exist. It follows that p has no right inverse. 

(4) Let s: R — R be defined by s(x) = x? for all x € R. The function s has no 
left inverse by from the same argument used to show that the function /f in Part (2) 
of this example had no left inverse, and s has no right inverse by the same argument 
used to show that the function p in Part (3) of this example had no right inverse. © 


Exercises 


Exercise 4.3.1. For each pair of functions f and g given below, find formulas for 
fog and go f (simplifying when possible). 


(1) Let f: R — R be defined by f(x) = e* for all x € R, and let g: R— R be 
defined by g(x) = sinx for all x € R. 

(2) Let f: (0,0) — (0,0°) be defined by f(x) =x’ for all x € R, and let 
g: (0,00) — (0,00) be defined by g(x) = x~? for all x € (0,09). 

(3) Let f: R— [0,0) be defined by f(x) =.x° for all x CR, and let g: [(0,0) +R 
be defined by g(x) = W/x for all x € [0,). 
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(4) Let f: R— R be defined by f(x) = |x| for all x € R, and let g: R— R be 
defined by g(x) = [x] for all x € R, where |x| and [x] are respectively the 
greatest integer less than or equal to x and the least integer greater than or 
equal to x. 


Exercise 4.3.2. For each of the following functions f: R — R, find functions 
g,h: R—R, neither of which is the identity map, such that f = hog. 


() f(x) = Wx+7 for allx ER. 
(2) f(x) = /*+4+7 for allx ER. 
x, if 0S 


” fey= {0 ifx <0. 
3 . 
(4) fey= {2 ifO0<x 


x, ifx<0. 


Exercise 4.3.3. Let f,g: R— R be defined by 


F(x) 1—2x, ifx>0 (x) 3x, ifx>0 
x)= Pl 
kl,  ifx<o, ° x—1, ifx<0. 


Find fog and go f. 
Exercise 4.3.4. 


(1) Find two functions h,k: R — R such that neither / nor k is a constant map, 
but koh is a constant map. 
(2) Find two functions s,t: R— R such that s 4 1p andt ¥ 1p, buttos = lg. 


Exercise 4.3.5. [Used in Theorem 6.3.11 and Theorem 6.6.5.] Let A and B be sets, let 
U CA and V CC be subsets, and let f: A — B and g: B — C be functions. Prove 
that 

(gof)(U)=a(f(U)) and (gof)'(V)=f(g'V)). 
Exercise 4.3.6. Let A, B and C be sets, and let f: A Band g: B —C be functions. 
Suppose that f and g have inverses. Prove that go f has an inverse, and that (go 


fy t=flog™. 
Exercise 4.3.7. Find two right inverses for each of the following functions. 


(1) Let h: R — [0,c¢) be defined by h(x) = |x| for all x € R. 
(2) Let k: R — [1,c) be defined by k(x) = e* for allx ER. 


Exercise 4.3.8. Find two left inverses for each of the following functions. 


(1) Let f: [0,00) — R be defined by f(x) = x? +4 for all x € [0,-). 
(2) Let g: R — R be defined by g(x) = e* for allx ER. 
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Exercise 4.3.9. Let h,k: IR — R be defined by 


4x+1, ifx>0 3x, ifx>0 
h(x) = ; k(x) = 
x, ifx <0, x+3, ifx<0. 


Find an inverse for koh. 


Exercise 4.3.10. Let A and B be sets, and let f: A — B be a function. Prove that if f 
has two distinct left inverses then it has no right inverse, and that if f has two distinct 
right inverses then it has no left inverse. 


Exercise 4.3.11. Let B be a set, let A,,...,A, be sets for some k € N, let U; C A; be 
a subset for alli € {1,...,k} and let f: B— A, x --- x Ag be a function. Prove that 


(fi) (Wi), 


_)> 


f 1(U1 x +X Uy) = 
i=] 


where the jf; are the coordinate functions of f. 


Exercise 4.3.12. Let B be a set, let A;,...,A, be sets for some k € N and let h;: B— 
A; be a function for each i € {1,...,k}. Prove that there is a unique function g: B > 
A, X-++ x A, such that 4; 0g = h; for alli c {1,...,k}, where 7): Ay x +++ x Ay A; 
is the projection map. This exercise can be represented by the following commutative 
diagram. 


B > Ay X-+- xX Ag 


Tj 


Aj 


Exercise 4.3.13. This exercise and the next give examples of definitions of functions 
by universal property. Rather than defining what a certain function is, we state how 
it should behave, and then prove that there exists a function satisfying the given be- 
havior. Such constructions are important in category theory, a branch of mathematics 
that provides a useful (though abstract) language for many familiar mathematical 
ideas, and has applications to various aspects of mathematics, logic and computer 
science. See [AM75] or [Kri81] for an introduction to category theory, and [Pie91] 
for some uses of category theory in computer science. 

Let A and B be sets, and let f,g: A — B be functions. Prove that there exist a set 
E and a function e: E — A such that foe = goe, and that for any set C and function 
h: CA such that foh = goh, there is a unique function t: C — E such that 
h=eot. This last condition is represented by the following commutative diagram. 
The function e is called an equalizer of f and g. To define £, consider subsets of A. 
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| h 
Fi 
Se 2 


Exercise 4.3.14. This exercise is similar to Exercise 4.3.13. Let A, B and C be sets, 
and let f: A — C and g: B —C be functions. Prove that there exist a set P and 
functions h: P — A andk: P — B such that foh = gok, and that for any set X and 
functions s: X — A andt: X — B such that fos = got, there is a unique function 
u: X — P such that s=hou and t=kou. This last condition is represented by the 
following commutative diagram. The set P together with the functions / and k are 
called a pullback of f and g. To define P, consider subsets of A x B. 


x 


4.4 Injectivity, Surjectivity and Bijectivity 


As we saw in Example 4.3.9, there exist functions with neither right inverse nor left 
inverse; others with a right inverse but not a left inverse; others with a left inverse 
but not a right inverse; and yet others with both a right and a left inverse, and hence 
with an inverse by Lemma 4.3.7 (2). Unfortunately, it is not always easy to verify 
whether a function has a right inverse, left inverse or both directly from the definition, 
because such verification entails finding a suitable candidate for the appropriate type 
of inverse, and doing so for any but the simplest functions is often quite difficult, 
and at times virtually impossible. Given the importance of inverse functions in many 
parts of mathematics, it would be very nice if there were some convenient criteria by 
which to check whether a function in principle has a right inverse, left inverse or both 
without having to produce the desired function. Remarkably, there are such criteria, 
as seen in Theorem 4.4.5 below. 

To understand the criteria for the existence of right inverses and left inverses, we 
start with an example. 


Example 4.4.1. Let P be the set of all people, and let m: P — P be the function that 
assigns to each person her mother. Does this function have a right inverse or a left 
inverse? Suppose first that g: P — P is a right inverse for m. That would mean that 
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mo g = |p, and therefore m(g(x)) = x for every x € P. Let y € P, and suppose that 
y is aman. Then m(g(y)) = y, which would mean that y is the mother of g(y), and 
that is not possible, because y cannot be anyone’s mother. Therefore m has no right 
inverse. Observe that the obstacle to finding a right inverse for m is that there are 
objects in the codomain (namely, all men and some women) who are not in the range 
of m (which is the set of mothers). 

Now suppose that h: P — P is a left inverse for m. That would mean that hom = 
1p, and therefore h(m(x)) = x for every person x. Here we will encounter a different 
problem than with the proposed right inverse. Let a,b € P, and suppose that a and b 
are siblings. Then m(a) = m(b), and hence h(m/(a)) = h(m(b)). Because h(m(x)) =x 
for every x € P, we deduce that a = b, which is a contradiction. Hence m has no 
left inverse. The obstacle to finding a left inverse for f is that there are two different 
objects in the domain (namely, a pair of siblings) that are mapped to the same element 
of the codomain (namely, their mother). ©) 


It turns out that the two problems identified in Example 4.4.1 are the only obsta- 
cles to finding right inverses and left inverses, respectively. We now give names to 
functions that do not have these problems. 


Definition 4.4.2. Let A and B be sets, and let f: A — B be a function. 


1. The function f is injective (also called one-to-one or monic) if x 4 y implies 
f(x) 4 f(y) for all x,y € A; equivalently, if f(x) = f(y) implies x = y for all 
X,yEA. 

2. The function f is surjective (also called onto or epic) if for every b € B, 
there exists some a € A such that f(a) = b; equivalently, if f(A) = B. 

3. The function f is bijective if it is both injective and surjective. 


Observe that a function is surjective if and only if its range equals its codomain. 

There exist functions that are both injective and surjective, that are surjective but 
not injective, that are injective but not surjective and that are neither injective nor 
surjective. Examples of such functions are seen graphically in Figure 4.4.1, and via 
formulas in the following example respectively. 


injective surjective bijective 


Fig. 4.4.1. 
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Example 4.4.3. 


(1) Let k: [0,cc) — [0,00) be defined by k(x) = x” for all x € [0, 0). This function 
is surjective and injective, and hence bijective. First, we show that k is injective. Let 
x,y € [0,00). Suppose that k(x) = k(y). Then x? = y?. It follows that Vx? = \/y?, 
and because x > 0 and y > 0, we deduce that x = Vx2 = Jy? = y. Hence k is injec- 
tive. Second, we show that k is surjective. Let b € [0,00). Then Vb € [0,-), and so 
k(v/b) = (Wb)? = b. Hence k is surjective. 

(2) Let g: [0,0c) — R be defined by g(x) = x? for all x € [0,-). This function is 
injective but not surjective. The proof of the injectivity of g is the same as the proof 
of the injectivity of the function k in Part (1) of this example. The reason that g is not 
surjective is that g(a) ~ —2 for any a € [0,°0), though —2 is in the codomain of g. 

(3) Let h: R — [0,0c) be defined by h(x) = x? for all x € R. This function is 
surjective but not injective. The proof of the surjectivity of h is the same as the proof 
of the surjectivity of the function k in Part (1) of this example. The reason h is not 
injective is because h(—3) = 9 = h(3) even though —3 ¥ 3. (Observe that instead of 
+3 we could have used a for any positive number a, but a single instance where 
the definition of injectivity fails is sufficient.) 

(4) Let f: R — R be defined by f(x) = x? for all x € R. This function is neither 
injective nor surjective, which is seen using the same arguments as the corresponding 
arguments for g and A in Parts (2) and (3) of this example. ?) 


Observe from Example 4.4.3 that injectivity and surjectivity very much depend 
upon the choice of domain and codomain of a function. That is one of the reasons 
why we need to specify the domain and codomain when we define a function. 

In many texts, especially at the elementary level, the terms “one-to-one” and 
“onto” are used instead of “injective” and “surjective,” respectively, and the reader 
should therefore be familiar with the former terms, though the author finds the latter 
terms (also widely used) to be preferable. The term “one-to-one” is awkward, and 
the word “onto” is a preposition (in contrast to the adjective “one-to-one”’), and as 
such is not grammatically parallel to “one-to-one.” By contrast, the two adjectives 
“injective” and “surjective” are grammatically parallel, reflecting the parallel roles 
of these two concepts, as the reader will soon see. Moreover, some texts use the 
word “onto” as if it were an adjective, leading to grammatically problematic phrases 
such as “the function f is a one-to-one and onto function.” Other texts are careful 
to use “onto” as a preposition, leading to awkward (though correct) phrases such as 
“the function f is a one-to-one function from A onto B,’ which again make the two 
concepts seem not parallel. (If the reader really prefers to use prepositions rather 
than adjectives to describe functions, the author’s proposed scheme would be that 
an arbitrary function f: A — B is described as a function from A fo B; an injective 
function is described as a function from A into B; a surjective function is described 
as a function from A onto B; and a bijective function is described as a function from 
A unto B. The author would not necessarily recommend the use of this scheme, but 
it is grammatically consistent.) 
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One way of thinking about injectivity, surjectivity and bijectivity is as follows. 
Let f: A — B be a function. The function f is injective if and only if for each b € 
B, there is at most one element in the inverse image f—'({b}); the function f is 
surjective if and only if for each b € B, there is at least one element in the inverse 
image f—'({b}); the function f is bijective if and only if for each b € B, there is 
precisely one element in the inverse image f~!({b}). Consider now the special case 
of a function f: R— R. Then the function f is injective if and only if each horizontal 
line in the plane intersects its graph at most once; see Figure 4.4.2 (i). The function 
f is surjective if and only if each horizontal line intersects its graph at least once; 
see Figure 4.4.2 (ii). The function f is bijective if and only if each horizontal line 
intersects its graph once and only once. 


Fig. 4.4.2. 


There are standard strategies for proving that a function is each of injective and 
surjective. Let f: A — B be a function. If we wish to prove that f is injective, then 
we need to show that f(x) = f(y) implies x = y for all x,y € A. As usual, if we need 
to show that something is true for all x,y € A, we will choose arbitrary x and y, and 
then prove the desired property for this choice. Hence, a proof of the injectivity of f 
typically has the following form. 


Proof. Let x,y € A. Suppose that f(x) = f(y). 


(argumentation) 


Then x = y. Hence f is injective. 


If we wish to prove that f is surjective, we need to show that for every b € B, 
there exists some a € A such that f(a) = b. A proof of the surjectivity of f would 
therefore have the following form. 
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Proof. LetbeB. 
Leta=.... 


(argumentation) 


Then b = f(a). Hence f is surjective. 


We will use the above strategies repeatedly, starting with the proof of the follow- 
ing lemma, which shows that composition of functions behaves nicely with respect 
to injectivity, surjectivity and bijectivity. 


Lemma 4.4.4. Let A, B and C be sets, and let f: A— Band g: B= C be functions. 


1. If f and g are injective, then go f is injective. 
2. If f and g are surjective, then go f is surjective. 
3. If f and g are bijective, then go f is bijective. 


Proof. 


(1). Suppose that f and g are injective. We wish to show that gof: A—C 
is injective. Let x,y € A. We will show that (go f)(x) = (go f)(y) implies x = y. 


Suppose that (g 0 f)(x) = (go f)(y). Then g(f(x)) = g(f(y)). Because g is injective, 
we deduce that f(x) = f(y). Because f is injective, we deduce that x = y. 


(2). Suppose that f and g are surjective. We wish to show that go f: A—C 
is surjective. Let c € C. We will show that there exists some element a € A such 
that (go f)(a) = p. Because c € C and g is surjective, there is some b € B such 
that g(b) = c. Because f is surjective, we know that there is some a € A such that 


f(a) = b. It follows that (go f)(a) = g(f(a)) = g(b) =c. 
(3). This part is derived easily from Parts (1) and (2) of this lemma. 


As seen in Exercise 4.4.14, the converse to each of the parts of Lemma 4.4.4 is 
not true, though a partial result does hold. 

The following theorem, which is extremely useful throughout mathematics (and 
is perhaps the author’s favorite theorem in this text), answers the question posed at 
the start of this section concerning criteria for the existence of inverse functions. 


Theorem 4.4.5. Let A and B be non-empty sets, and let f : A — B be a function. 


1. The function f has a right inverse if and only if f is surjective. 
2. The function f has a left inverse if and only if f is injective. 
3. The function f has an inverse if and only if f is bijective. 
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Proof. 


(1). Suppose that f has a right inverse g. Then fo g = 1g. We wish to show that 
f is surjective. Let b € B. We need to find an element a € A such that f(a) = b. Let 
a = g(b). Then f(g(b)) = (f0g)(b) = 1a(b) =. 

Now suppose that f is surjective. We wish to show that f has a right inverse, 
which means that we need to find a function h: B — A such that foh = 1g. We 
define h as follows. For each b € B, the surjectivity of f implies that there is at least 
one element a € A such that f(a) = b; let h(b) = a for some choice of such a (it 
doesn’t matter which one). It is now true by definition that f(i(b)) = b for all b € B. 
Hence foh = 1,. 


(2). Left to the reader in Exercise 4.4.9. 


(3). This part follows from Parts (1) and (2) of this theorem, together with 
Lemma 4.3.7 (2). 


The alert reader will have noticed that in the proof of Part (1) of Theorem 4.4.5, 
we had to choose, simultaneously, one element a € A such that f(a) = b, for each 
b € B. That is, we implicitly made use of the Axiom of Choice which was discussed 
in Section 3.5, and reformulated in terms of function in Section 4.1. As is common, 
for the sake of brevity and in order to avoid distraction from the essential idea of the 
proof of Theorem 4.4.5, we did not explicitly make use of the Axiom of Choice in 
that proof, though it would certainly have been possible to have done so. For exam- 
ple, we could have written: “We define h as follows. The surjectivity of f implies 
that the set f—'({b}) is non-empty for each b € B. Then {FUP ey is a fam- 
ily of non-empty sets. By the Axiom of Choice (Theorem 4.1.5) there is a function 
h: B—Upeg f'({b}) such that h(b) € f—'!({b}) for all b € B. It is now true by def- 
inition that f(h(b)) = b for all b € B. Hence f oh = 1g.” The reader might wonder 
whether it would have been possible to prove Theorem 4.4.5 (1) without the Axiom 
of Choice or something equivalent, but it turns out that that would not have been pos- 
sible, because Theorem 4.4.5 (1) is in fact equivalent to the Axiom of Choice, as seen 
in Exercise 4.4.19. Interestingly, as the reader will see if she does Exercise 4.4.9, the 
proof of Theorem 4.4.5 (2) does not require the Axiom of Choice. 

The following result concerning “cancellation” of functions is a typical applica- 
tion of Theorem 4.4.5. 


Theorem 4.4.6. Let A and B be non-empty sets, and let f : A — B be a function. 


1. The function f is injective if and only if f og = f oh implies g =h for all 
functions g,h: Y — A for all sets Y. 

2. The function f is surjective if and only if go f =ho f implies g =h for all 
functions g,h: B — X for all sets X. 


Proof. We will prove Part (2), leaving the remaining part to the reader in Exer- 
cise 4.4.15. 


(2). First assume that f is surjective. Let g,h: B — X be functions such that 
go f =hof for some set X. By Theorem 4.4.5 (1), the function f has a right inverse 
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q: BA. Then (go f) og = (ho f) og. Using Lemma 4.3.5 and the definition of 
right inverses, it follows that go (fog) =ho(foq), and hence golg = ho lg, and 
therefore g =h. 

Now assume f is not surjective. Let b € B be an element that is not in the range 
of f. Let X = {1,2}, and let g,h: BX be defined by g(y) = 1 for all y € B, and by 
h(y) = 1 for all y € B— {b} and h(b) = 2. It can then be verified that go f =ho f, 
even though g #h. The desired result now follows using the contrapositive. 


Exercises 


Exercise 4.4.1. Is each of the following functions injective, surjective, both or nei- 
ther? Prove your answers. Feel free to use standard properties of functions such as 
polynomials, logarithms and the like. 


(1) Lett: (1,0¢) — R be defined by t(x) = Inx for all x € (1,). 

(2) Let s: R — R be defined by s(x) = x4 —5 for allx ER. 

(3) Let g: [0,0c) — [0, 1) be defined by g(x) = 745 for all x € [0,°). 
(4) Let k: R? — R be defined by k((x,y)) =x? +’ for all (x,y) € R?. 
(5) Let Q: N — P(N) be defined by Q(n) = {1,2,...,n} for alln EN. 


Exercise 4.4.2. In each of the four cases below, we are given a function f such that 
f(x) = 3x+5 for all x in the domain. Is each function injective, surjective, both or 
neither? 


a 


(l) f: ZZ. (3) f:Q—1 
(2) f:Q-Q. (4) f:RAR. 


Exercise 4.4.3. [Used in Example 6.5.3.] Let f: R — (—1,1) be defined by 


eG 
=, ifx>0 
fe) = {BE cies 


ree ifx <0. 


Prove that f is bijective. Use only the methods we have used in this text, including 
the standard algebraic properties of the real numbers; do not use calculus. 


Exercise 4.4.4. Let A and B be sets, and let § C A be a subset. We will use various 
definitions from Section 4.1. 


(1) Prove that the identity map 14: A — A is bijective. 

(2) Prove that inclusion map j: S — A is injective. 

(3) Let f: A — B be a function. Suppose that f is injective. Is the restriction f|s 
necessarily injective? Give a proof or a counterexample. 

(4) Let g: A— B bea function. Suppose that g is surjective. Is the restriction g|5 
necessarily surjective? Give a proof or a counterexample. 

(5) Let h: S— B be a function, and let H: A — B be an extension of h. Suppose 
that / is injective. Is H necessarily injective? Give a proof or a counterexam- 


ple. 
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(6) Let k: S — B be a function, and let K: A — B be an extension of k. Suppose 
that k is surjective. Is K necessarily surjective? Give a proof or a counterex- 
ample. 

(7) Prove that the projection maps 7: A x B— A and 7: A x B > B are surjec- 
tive. Are the projection maps injective? 


Exercise 4.4.5. Let A and B be sets. Prove that there is a bijective function f: A x B— 
BXA. 


Exercise 4.4.6. [Used in Section 3.3.] Let A, B and C be sets. Prove that there is a 
bijective function g: (A x B) x C>Ax(BxC). 


Exercise 4.4.7. Let A be a set. Let @: P(A) — P(A) be defined by 9(X) =A —X for 
all X € P(A). Prove that @ is bijective. 


Exercise 4.4.8. [Used in Exercise 6.7.9.] This exercise makes use of Exercise 2.4.3. 
Let 
L={(a,b) €NxN|aand Dare relatively prime}, 

and let U,D: L — L be defined by U((a,b)) = (a+b,b) and D((a,b)) = (a,a+b) 
for all (a,b) € L. These functions are well-defined by Exercise 2.4.3. 

(1) Prove that (1,1) ¢ U(L) and (1,1) ¢ D(L). 

(2) Prove that U((a,b)) 4 (a,b) and D((a,b)) F (a,b) for all (a,b) € L. 

(3) Prove that U and D are injective. 

(4) Prove that U(L) ND(L) = 0. 


Exercise 4.4.9. [Used in Theorem 4.4.5.] Prove Theorem 4.4.5 (2). 


Exercise 4.4.10. In Theorem 4.4.5 it was assumed that A and B are non-empty sets. 
Which parts of the theorem still hold when A or B is empty? (Do not forget the case 
where A and B are both empty.) 


Exercise 4.4.11. [Used in Exercise 6.5.15 and Theorem 6.6.5.] Let A and B be sets, 
let P,Q CA be subsets and let f: A — B be a function. Suppose that f is injective. 
Prove that f(P — Q) = f(P) — f(Q). [Use Exercise 4.2.11.] 


Exercise 4.4.12. Let A and B be sets, and let f: A — B be a function. 


(1) Prove that f is injective if and only if E = f—'(f(E)) for all subsets E C A. 
(2) Prove that f is surjective if and only if F = f(f~'(F)) for all subsets F C B. 


Exercise 4.4.13. [Used in Lemma 6.5.11, Theorem 6.5.13, Theorem 6.6.8 and Theo- 
rem 7.7.10.] Let A and B be sets, and let f: A — B and g: B — A be functions. 


(1) Suppose that f is injective, and that g is a left inverse of f. Prove that g is 
surjective. 

(2) Suppose that f is surjective, and that g is a right inverse of f. Prove that g is 
injective. 

(3) Suppose that f is bijective, and that g is the inverse of f. Prove that g is 
bijective. 
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Exercise 4.4.14. [Used in Section 4.4.] Let A, B and C be sets, and let f: A — B and 
g: B—C be functions. 


(1) Prove that if go f is injective, then f is injective. 

(2) Prove that if go f is surjective, then g is surjective. 

(3) Prove that if go f is bijective, then f is injective, and g is surjective. 

(4) Find an example of functions f: A — B and g: B —C such that go f is 
bijective, but f is not surjective, and g is not injective. Hence Parts (1)—(3) of 
this exercise are the best possible results. 


Exercise 4.4.15. [Used in Theorem 4.4.6.] Prove Theorem 4.4.6 (1). 


Exercise 4.4.16. Let A and B be sets, and let h: A — B be a function. Prove that h is 
injective if and only if h(X NY) =h(X)NA(Y) for all X,Y CA. 


Exercise 4.4.17. Let A and B be sets, and let f: A — B be a function. Prove that f 
is surjective if and only if B— f(X) C f(A—X) for all X CA. 


Exercise 4.4.18. Let A and B be sets, and let f: A — B be a function. As discussed 
at the end of Section 4.2, we can think of f as inducing a function f,,: P(A) — P(B), 
and a function f*: P(B) — P(A). 


(1) Prove that f,. is injective if and only if f is injective. 

(2) Prove that f,. is surjective if and only if f is surjective. 

(3) Prove that f* is injective if and only if f is surjective. 

(4) Prove that f* is surjective if and only if f is injective. 

(5) Prove that f, is bijective if and only if f* is bijective if and only if f is 
bijective. 

(6) Suppose that f is bijective. Prove that f, and f* are inverses of each other. 


Exercise 4.4.19. [Used in Section 3.5 and Section 4.4.] Suppose that every surjec- 
tive function has a right inverse. Prove the Axiom of Choice. By Exercise 3.5.2, it 
is sufficient to prove the Axiom of Choice for Pairwise Disjoint Sets. Although Ex- 
ercise 3.5.2 was stated for the family of sets versions of the Axiom of Choice, that 
exercise also applies to functions versions of the axiom (which are equivalent to the 
family of sets versions). We did not explicitly state what would be called the Axiom 
of Choice for Pairwise Disjoint Sets—Functions Version, but the reader can figure 
out what that version would be (by comparing the statements of Axiom 3.5.2 and 
Theorem 4.1.5), and make use of that version in this exercise. 


Exercise 4.4.20. [Used in Exercise 4.4.21.] Let A be a non-empty set, and let f: A — 
A bea function. Suppose that f is bijective. For eachn € N, let f” denote the function 
A — A given by 
fla for-of. 
ee 
n times 


The function f” is the n-fold iteration of f. (Such a definition, while intuitively rea- 
sonable, is not entirely rigorous, because the use of --- is not rigorous; a completely 
rigorous definition will be given in Example 6.4.2 (2).) 
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We now extend the definition of f” to all n € Z. Let f° = 1,4. Because f is 
bijective, it follows from Exercise 6.4.4 that f” is bijective. Hence f” has an inverse. 
For eachn €N, let f~” = (f”)~!. It can be verified that f¢ 0 f? = f¢*+® and (f%)? = 
f@? for all a,b € Z, though we omit the details for the sake of getting to the interesting 
part of this exercise; the interested reader can find the details of the first of these 
equalities, though in a different setting, in the proof of [Blol1, Lemma 2.5.9]. 


(1) Let x,y,z € A. Prove that the following three properties hold. 
a. x = f"(x) for some n € Z. 
b. If y= f"(x) for some n € Z, then x = f(y) for some m € Z. 
c. If y= f"(x) for some n € Z, and z = f(y) for some m € Z, then z = 
f? (x) for some p € Z. 
(In Section 5.3 we will see that these three properties are particularly impor- 
tant.) 
(2) Leta € A. The orbit of a with respect to f, denoted O,, is the set defined by 
Ou = {f"(a) |n €Z}. 
Let x,y € A. Prove that the following properties hold. 
a. If y= f(x) for some m € Z, then O, = Oy. 
b. Ify 4 f"(x) for any n € Z, then O,NO, =9. 
c. x € Oy if and only if y € O,. 
d. A = Uye, Ox. 
Putting these observations together, we see that A can be broken up into dis- 
joint sets, each of which is the orbit of all its members. (Using the terminol- 
ogy of Section 5.3, we will say that the orbits of f form a partition of A.) 
(3) Give an example of a bijective function Z — Z with infinitely many orbits. 
For each r € N, give an example of a bijective function Z — Z with precisely 
r orbits. 


Exercise 4.4.21. This exercise makes use of Exercise 4.4.20. Let A be a non-empty 
set, and let f: A — A be a function. Suppose that f is bijective. Suppose further that 
A is finite; the results in this exercise are valid only for finite sets. Let x,y € A. 


(1) Prove that f” = 14 for some m € N. Use the fact that because A is finite, 
there are only finitely many bijective functions A — A; this fact is proved in 
Theorem 7.7.4 (3). Let r € N be the smallest natural number such that f” = 
14. (It makes sense intuitively that there is such a smallest natural number; 
formally we make use of the Well-Ordering Principle (Theorem 6.2.5).) 

(2) Suppose that y = f'(x) for some i € Z. Prove that there is some s € NU {0} 
such that y = f*(x). 

(3) Prove that if f*(x) = x for some k € Z, then f*(w) = w for all w € Oy. 

(4) The stabilizer of x with respect to f, denoted f,, is the set defined by fy = 
{meZ| f"(x) =x and 0 <m <r}. Suppose that y € O,. Prove that fy = fr. 

(5) Prove that there is some v € N such that f”(x) =x. Use the fact that A is finite. 
The order of x with respect to f, denoted n,, is the smallest g € N such that 
F4(x) =x. 

(6) Prove that O, = {f°(x), f'(x), f?(x),...,f" |(x)}. Use the Division Algo- 
rithm (Theorem A.5 in the Appendix). 
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(7) Prove that n,|r. 

(8) Prove that if k € fy, then n,|k. 

(9) Prove that r = |O,|-| ft]. 

(10) Prove that r= Yy<o, |fl- 

(11) Because A is finite, there are finitely many distinct orbits in A. Let B denote 
the number of distinct orbits of f. Prove that r-B = Yy<4|fyl- 

(12) For each m € {0,...,r—1}, the fixed set of m, denoted Am, is the set defined 
by Am = {z € A | f(z) =z}. Prove that r-B = Y'7) |Aj|. This result is a 
special case of Burnside’s Formula; see [Fra03, Section 17] for details. 


4.5 Sets of Functions 


We now go to one level higher of abstraction than we have seen so far. Until now we 
have looked at one function at a time; now we discuss sets of functions, for example 
the set of all functions from one set to another. Such sets are useful in many branches 
of mathematics, for example linear algebra, and hence are well worth studying. We 
will use sets of functions briefly at the end of Section 6.7, and a bit more exten- 
sively in Section 7.7. The material in this section is among the most conceptually 
difficult in this book, but the reader who has understood the previous material can, 
with sufficient effort, master the present section as well. We start with the following 
definition. 


Definition 4.5.1. Let A and B be sets. The set of all functions A — B is denoted 
F (A,B). A 


For any set A and B, we observe that ¥ (A, B) is also a set, where each element of 
the set ¥ (A,B) is a function A — B. There is no theoretical problem with having a set 
that has elements that are functions, though sometimes it is hard to get an intuitive 
picture of what is going on with such sets. Results about sets of functions are proved 
no differently from results about sets containing intuitively simpler objects such as 
numbers. 


Example 4.5.2. 


(1) IfA 40 and B=90, then ¥(A,B) =90. If A =9, then (A,B) = {0}. fA 40 
and B #0, then (A,B) 4 @, because there is at least one constant map A — B. 

(2) Let A = {1,2} and B = {x,y}. Then ¥ (A,B) = {f,g,h,k}, where the func- 
tions f,g,4,k: A— Bare defined by f(1) =xand f(2) =x, by g(1) =x and g(2) =y, 
by h(1) = y and h(2) = x, and by k(1) = y and k(2) = y. 

(3) The set ¥ (IR, R) has a number of useful subsets, including the set C(R, R) of 
all continuous functions R — R, and the set D(IR,R) of all differentiable functions 
R — R. Observe that D(R,R) S C(R,R) & ¥(R,R). We can define some useful 
functions between these three sets, for example K: D(R,R) — #(R,R) defined by 
K(f) =f’ for all f € D(IR,R). We observe that the function K is not injective. For 
instance, let f,g € D(IR,R) be defined by f(x) = x?+5 and g(x) =x +7 for all 
x €R. Then K(f) = K(g), even though f 4 g. Though it is not obvious, the function 
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K is also not surjective; in other words, there are functions R — R that do not have 
antiderivatives. A proof of this fact is beyond the scope of this book, and can be 
found in [Blol1, Example 4.4.11]. 

(4) We can give an intuitive interpretation of the set ¥(N,R) as follows. Let 
f € £(N,R). Then we obtain a sequence of real numbers by writing f(1), f(2), 
f (3), .... Conversely, given a sequence of real numbers a1, a2, a3, ..., we can define 
an element g € ¥(N,R) by setting g(1) = a1, and g(2) = ap, and so on. Hence each 
element of ¥(N,IR) corresponds to a sequence of real numbers, and conversely. In 
fact, the formal definition of a sequence of real numbers is simply an element of 


F(N,R). y) 


There are many possible results that can be proved concerning sets of functions; 
we give two typical results. We start with a relatively simple lemma, which will be 
of use later on. 


Lemma 4.5.3. Let A, B, C and D be sets, and let f: A— C and g: B= D be 
functions. Suppose that f and g are bijective. Then there is a bijective function from 


F (A,B) to F(C,D). 


Proof. Because f and g are both bijective, they have inverses f—! and g~!, re- 


spectively. Let ®: ¥(A,B) — #(C,D) be defined by B(h) = goho f~! for all 
h © ¥ (A,B). The function ®@ is represented in the commutative diagram following 
this proof. It is straightforward to see that (h) € F(C,D) for all h € F(A,B), so 
® is well-defined. We need to show that @ is bijective. Let h,k € ¥ (A,B). Suppose 
that @(h) = &(k). Then goho f—! = goko f—!. Hence g-!o(gohof!)of = 
g !o(gokof—!)o f, and making repeated use of Lemma 4.3.5 it follows that h = k. 
Therefore © is injective. Now let r € ¥(C,D). Let t = g~! oro f. It can be seen that 
t € (A,B). We compute @(t) = goto f-!=go(g loro f)of—! =r. It follows 
that @ is surjective. Hence @ is bijective. 


h B 


A 
8 
c 


———> 
P(h) 

Our next result, which is a bit more complicated than the previous one, gives 
a relation between power sets and sets of functions. More precisely, let A be a set. 
The theorem says that there is a bijective function from (A) to F(A, {0, 1}). What, 
intuitively, is the relation between elements of P(A), each of which is a subset of A, 
and elements of ¥ (A, {0, 1}), each of which is a function A — {0,1}? Let S € P(A), 
so that S C A. We want to associate with this set S a function A — {0,1}. To do so, 
observe that we can divide A into the two disjoint subsets S and A — S. We then define 
a function from A to {0,1} by assigning the value of | to every element in S, and 0 to 
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every element in A — S. For different choices of S, we will obtain different functions 
A — {0,1}. For convenience, we will use the notation and result of Exercise 4.1.8. 
This theorem might seem rather technical, but we will use it in a few places, for 
example Example 6.3.5, which is about switching circuits, and Example 6.7.6 (via 
the proof of Exercise 6.7.7), which is about programming languages. 


Theorem 4.5.4. Let A be a set. Then there is a bijective function from P(A) to 
F(A, {0,1}). 


Proof. If A = 0, then (0) = {0} by Example 3.2.9 (1), and F(A, {0,1}) = {0} by 
Example 4.5.2 (1), and therefore the identity map is a bijective function from P(A) 
to F(A, {0, 1}). Now suppose that A 4 @. Recall the notation of Exercise 4.1.8. 
Let ®: P(A) > F(A, {0, 1}) be defined B(S) = yz for all S € P(A). We give two 
proofs that @ is bijective, because each is instructive. 
First Proof: We will show that ® is bijective by showing that it is injective and 
surjective, starting with the former. Let $,T € P(A). Suppose that ®(S) = B(T). 
Then Ys = 77, and it follows from Exercise 4.1.8 that S= 7. Therefore ® is injective. 
We now show that ® is surjective. Let f € (A, {0,1}). Let S = f~!({1}), so 
that S € P(A). We will show that (S) = f, which is the same as showing that 75 = f. 
Both xs and f are functions A + {0,1}. Observe that A—S = f~!({0}). Then, if 
y €A, we see that 


oyadh tyes _ fh, ify € f-'({1}) 
ASW) Vo ityeA—S )0, ifye f—({0}) 
_ jl, iffO)=1 _ 
-{f if f(y) =0 = f() 


Hence Ys = f, and it follows that @ is surjective. 


Second Proof: We will show that @ is bijective by producing an inverse for it. Let 
W: F(A, {0,1}) — P(A) be defined by ¥(f) = f~!({1}) for all f € (A, {0, 1}). 
We will show that ¥ is an inverse for ® by showing that 
Yo@P= 14) and PoP= 1 ¢(4,{0,1})- 
Let S € P(A). Then 
-1 
(Wo@)(S) =P(P(S)) = [xs] ({1}) =S. 


It follows that Yo} = lov). 
Let f € F(A, {0,1}). Then 


(DoW) (f) = B(W(F)) = B(F"({1})) = Xp: 


We therefore need to show that ¥ --1(¢13) = f. Observe that A — f-!({1}) = f-!({O}). 
Then, if y € A, we see that 
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_ fi, ifye ff) 
AHO = V9 igyea—f-i(Lp) 


_ fi, iffo)=1 
0, if f(y) =0 


Hence Xf) = f, and we conclude that PoP = 1¢(4 40,13). 


= f(y). 


In addition to the set of all functions from one set to another, there are a number 
of other sets of functions that are of interest. We now define two types of sets of 
functions, though there are many other such sets of functions that the reader might 
encounter during further study of mathematics, for example the set of all linear maps 
from one vector space to another, which is an important concept in linear algebra. 


Definition 4.5.5. Let A and B be sets. The set of all injective functions A — B is 
denoted 7(A,B), and the set of all bijective functions A — B is denoted B(A,B). A 


It is also possible to look at the set of all surjective functions from one set to 
another, but we will not need it later on, and so we will not treat it here. For any 
sets A and B, we observe that B(A,B) C 1(A,B) C (A,B). Unlike the set F(A, B), 
which is never the empty set as long as both A and B are not empty, the set B(A,B) 
will be the empty set whenever A and B do not have “the same size” (a concept that 
is intuitively clear for finite sets, and that will be discussed for both finite sets and 
infinite sets in Section 6.5). Similarly, the set /(A,B) will be empty whenever A is 
“larger” than B. 


Example 4.5.6. 


(1) Let A and B be the sets given in Example 4.5.2 (2). It is seen that (A,B) = 
I(A,B) = {g,h}. 

(2) Let A = {1,2} and C = {x,y,z}. Then B(A,C) = @. As the reader is asked to 
show in Exercise 4.5.8, it turns out that 7(A,C) has six elements. This example is a 
special case of general results given in Theorem 7.7.4. o) 


Finally, we use sets of functions to resolve an issue that was left outstanding 
in Chapter 3. In Section 3.3 we defined the union, intersection and product of two 
sets. In Section 3.4 we showed how the definitions of union and intersection can be 
extended to arbitrary families of sets, rather than just two sets at a time, but we did 
not state how to form the product of an arbitrary family of sets, because we did not 
have the needed tools. We are now ready for the definition. 

We defined the product of two sets in terms of ordered pairs. Intuitively, an or- 
dered pair is something that picks out a “first” element and a “second” one. To gen- 
eralize this idea, we reformulate the notion of an ordered pair by using functions. 
(However, we could not have used this reformulation instead of our original dis- 
cussion of ordered pairs in Section 3.3, because we needed ordered pairs to define 
functions.) 

Let A and B be sets. We can think of an ordered pair (a,b) witha € A andb € Bas 
a function f: {1,2} AUB that satisfies the conditions f(1) € A and f(2) € B. The 
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element f(1) is the first element in the ordered pair, and f(2) is the second element 
in the ordered pair. Hence the product A x B can be thought of as the set of functions 


{f © F({1,2}, AUB) | f(1) € A and f(2) € B}. 


This reformulation of the definition of A x B can be generalized to arbitrary families 
of sets. We use the indexed form of such families, though the non-indexed version 
would work as well. 


Definition 4.5.7. Let / be a non-empty set, and let {A;},-, be a family of sets indexed 
by J. The product of the family of sets, denoted [];<;Aij, is the set defined by 


[]4: = {f € #(@UAa | f@ € Ai for all i € 7}. 


iel iel 
If all the sets A; are equal to a single set A, the product []j<,Ai is denoted by A’. A 


It is not hard to verify that if J is a non-empty set, and if A is a set, then 
A! = €(I,A). An example of this fact is RN = ¥(N,R). Given our discussion in 
Example 4.5.2 (4), we therefore see that RN is the set of sequences of real numbers. 

The reader might have noticed that Definition 4.5.7 looks somewhat familiar, and 
that would be good, because it will help us address a subtlety about this definition that 
we have so far glossed over. It is fine to write such a definition, but simply writing 
something does not always suffice to make it work. In this specific case, we need to 
ask whether for any family of non-empty sets {A;},-,, there actually is something in 
the set [];<; Aj. In other words, is there at least one function f: J — Uje,Ai such that 
f (i) € A; for all i € 7? Such a function would choose a single element from each set 
Aj. Our ability to make such a choice is exactly what is axiomatized in the Axiom of 
Choice, and that is what looks so familiar in Definition 4.5.7. If the reader compares 
the statement of the Axiom of Choice given in Theorem 4.1.5 with Definition 4.5.7, 
it is immediately evident that not only does the Axiom of Choice imply the following 
theorem, but the following theorem implies the Axiom of Choice; that is, the Axiom 
of Choice and the following theorem are equivalent. 


Theorem 4.5.8. Let I be a non-empty set, and let {A;} 
sets indexed by I. Then [jc Ai F 9. 


icy DE a family of non-empty 


Exercises 


Exercise 4.5.1. Let X = {l,m,n} and Y = {a@,B}. Describe all the elements of 
F (X,Y). 

Exercise 4.5.2. [Used in Theorem 7.7.4.] Let A and B be sets. Prove that if A or B has 
one element, there is a bijective function from ¥ (A, B) to B. 


Exercise 4.5.3. Let A and B be non-empty sets. Let ®: ¥(A,B) — F (P(A), P(B)) 
be the function defined by ®(f) = f, for all f € F(A,B), where f.: P(A) > P(B) 
is defined at the end of Section 4.2. Is ® injective, surjective, both or neither? 
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Exercise 4.5.4. Let A, B and C be sets. Suppose that A C B. 


(1) Prove that ¥(C,A) C F(C,B). 
(2) Prove that there is an injective function ¥(A,C) — F (B,C). 


Exercise 4.5.5. Let A, B, C be sets. Prove that there is a bijective function from 
F(C,A x B) to F(C,A) x F(C,B). 


Exercise 4.5.6. Let A, B, C be sets. Prove that there is a bijective function from 
F(A, F(B,C)) to F(B, F(A,C)). 


Exercise 4.5.7. Let A be a set, and let g: A — A be a function. Suppose that g is 
bijective. 
(1) Let Q,: #(A,A) — F(A,A) be defined by Q,(f) = go f forall f € F(A,A). 
Prove that Q, is bijective. 
(2) Let Ag: #(A,A) — F(A,A) be defined by A,(f) = go fog! for all fe 
F (A,A). Prove that Ag is bijective. Also, prove that Ag(hok) = Ag(h) 0 Ag(k) 
for all h,k € F(A,A). 


Exercise 4.5.8. [Used in Example 4.5.6.] Let A = {1,2} and C = {x,y,z}. Describe 
explicitly all the elements of 1(A,C). 


Exercise 4.5.9. [Used in Section 7.7.] Let A, B, C and D be sets, and let f: A—C 
and g: B — D be functions. Suppose that f and g are bijective. 


(1) Prove that there is a bijective function from I(A,B) to I(C,D). 
(2) Prove that there is a bijective function from B(A,B) to B(C,D). 


Exercise 4.5.10. [Used in Theorem 7.7.4 and Theorem 7.7.12.] Let A and B be sets, 
and leta€ A andbe B. 


(1) Prove that there is a bijective function from {f € ¥(A,B) | f(a) = b} to 
F(A— {a},B). 
(2) Prove that there is a bijective function from {f € (A,B) | f(a) = b} to 
1(A— {a},B— {b}). 
(3) Prove that there is a bijective function from {f € B(A,B) | f(a) = b} to 
B(A — {a},B—{b}). 
Exercise 4.5.11. Let A be a set. Let : B(A,A) — B(A,A) be defined by ®(f) = 
f 7! for all f € B(A). Prove that ® is bijective. 


Exercise 4.5.12. Let A be a set. A Z-action on A is a function: Z — B(A,A) that 
satisfies the following two properties: (1) (0) = 14, and (2) '(a+b) =I (a) oI (b) 
for all a,b € Z. 


(1) Prove that P(—a) = [I'(a)|"! for all a € Z. 

(2) Suppose that '(e) = 14 for some e € Z. Prove that (ne) = 1, for all n € Z. 
(3) Give two different examples of Z-actions on R. 

(4) Give two different examples of Z-actions on the set {1,2,3,4}. 


FS 


Relations 


Mathematicians do not study objects, but relations between objects. 
— Henri Poincaré (1854-1912) 


5.1 Relations 


In colloquial usage we say that there is a “relation” between two things if there is 
some connection between them. An example of a relation between people is that of 
having the same color hair, and another example is that of one person being the child 
of another person. In mathematics we also discuss relations between objects, but, as 
is often the case, the technical meaning of the word “relation” in mathematics is not 
entirely the same as the colloquial use of the word. Some examples of relations be- 
tween mathematical objects are very familiar, such as the relations = and < between 
real numbers. We saw some other relations in previous chapters, without having used 
the term “relation.” For example, we can define a relation between integers by say- 
ing that two integers a and b are related if and only if a|b. Relations (and especially 
equivalence relations, as discussed in Section 5.3), are used in crucial ways in many 
branches of mathematics, for example abstract algebra, number theory, topology and 
geometry. 

To get a feeling for the formal approach to relations, consider the relation of one 
person being a biological parent of another person. If we take any two people at 
random, say persons X and Y, then either X is a parent of Y or not. We can decide 
whether X is the parent of Y because we know the meaning of the word “parent,” and 
we know how to verify whether the condition of being someone’s parent is fulfilled. 
Alternatively, rather than relying on our knowledge of what being a parent means, 
we could list all pairs of people (X,Y), where X is a parent of Y. To verify whether 
two given people are a parent-child pair, we would then simply check two people 
against the list; such verification could be done by someone who did not know what 
the words “parent” and “child” meant. 

Similar to our formal definition of functions in Section 4.1, the formal approach 
to relations between mathematical objects is done in terms of listing pairs of related 
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objects. A mathematical relation might be randomly constructed, and is not neces- 
sarily based on any inherent connection between “related” objects, in contrast to the 
colloquial use of the word “relation.” To get the most broadly applicable definition, 
we allow relations between different types of objects (for example, a relation between 
people and numbers), rather than only between two objects of the same type. 


Definition 5.1.1. Let A and B be sets. A relation R from A to B is a subset RC A xB. 
Ifa€A andb € B, we write a Rb if (a,b) ER, anda Rb if (a,b) ¢ R. A relation on 
A is arelation from A to A. A 


Example 5.1.2. 


(1) Let A = {1,2,3} and B = {x,y,z}. There are many possible relations from 
A to B, one example of which would be the relation E defined by the set E = 
{(1,y),(1,z), (2,y)}. Then 1 E y, and 1 FE z, and2 E y, but 3 F x. 

(2) Let P be the set of all people. Define a relation on P by having person x 
related to person y if and only if x and y have at least one parent in common. 

(3) The symbols < and < both represent relations on R. 

(4) Let P be the set of all people, and let B be the set of all books. Define a 
relation from P to B by having person x related to book b if and only if x has read b. 

(5) Let A be a set. The symbol “C” represents a relation on P(A), where P,Q € 
(A) are related if and only if P C Q. 

(6) Let f: A — B bea function. Then f is defined by a subset of A x B satisfying 
a certain condition. Hence f is also a relation from A to B. The concept of a relation 
is therefore seen to be more general than the concept of a function. In principle, 
it would have been logical to have the chapter on relations before the chapter on 
functions, and to view functions as a special case of relations. In practice, however, 
most mathematicians do not think of functions as special types of relations when they 
use functions on a daily basis, and therefore functions deserve their own treatment 
independent of the study of relations. » 


Let R and S be relations from A to B. To say that “R = S” means that the two 
relations are both defined by the same subset of A x B. This criterion can be rephrased 
by saying that x R y if and only if x S y, for all x € A and y € B. A proof that R and S$ 
are equal typically has the following form. 


Proof. Let x € A and y © B. First, suppose that x R y. 
(argumentation) 


Then x Sy. 
Second, suppose that x S y. 


(argumentation) 
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Thenx Ry. 
Therefore R= S. 


We will see an example of this strategy in the proof of Theorem 5.3.18. 

Just as a person might wish to find out who all of her relatives are, if we have a 
relation from a set A to a set B, it is sometimes useful to find all the elements of B 
that are related to a given element in A. 


Definition 5.1.3. Let A and B be non-empty sets, let R be a relation from A to B, and 
let x € A. The relation class of x with respect to R, denoted R [x], is the set defined 
by 

R[x] = {ye B|xRy}. 


If the relation R is understood from the context, we will often write [x] instead of 
R {x}. A 


Example 5.1.4. We continue the first three parts of Example 5.1.2. 


(1) For this relation we see that [1] = {y,z}, and [2] = {y}, and [3] =. 

(2) There are a number of distinct cases here, and we will examine a few of them. 
If x is the only child of each of her parents, then [x] = {x}, where we observe that 
x has the same parents as herself. If y and z are the only two children of each of 
their parents, then [y] = {y,z} = [z]. If a has one half-sibling b by her father, and 
another half-sibling c by her mother, and each of b and c has no other siblings or 
half-siblings, then [a] = {a,b,c}, and [b] = {a,b}, and [c] = {a,c}. 

(3) For the relation <, we see that [x] = (x,ce) for all x € R, and for the relation 
<, we see that [x] = [x, °c) for allx € R. ?) 


In Example 5.1.4 we saw various possible behaviors of relation classes. The re- 
lation class of an element may be empty, for example [3] in Part (1) of the example. 
The relation class of an element need not contain that element, for example [x] for 
any x € R with respect to the relation < in Part (3) of the example. Different elements 
may have overlapping relation classes, for example [b] and [c] in Part (2) of the ex- 
ample. In fact, different elements can have identical relation classes, for example [y] 
and [z] in Part (2) of the example. In Section 5.3 we will discuss a certain type of 
relation with particularly nicely behaved relation classes. 

In the following definition we give three such properties of relations that will be 
useful to us in the next two sections, and in many parts of mathematics. 


Definition 5.1.5. Let A be a non-empty set, and let R be a relation on A. 


1. The relation R is reflexive if x R x, for all x € A. 
2. The relation R is symmetric if x R y implies y R x, for all x,y € A. 
3. The relation R is transitive if xR y andy RzimplyxRz,forallx,y,zEA. A 


As seen in the following example, a relation can have any combination of the 
above three properties. In most of the parts of this example we leave it to the reader 
to verify that the given relation has the stated properties. 
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Example 5.1.6. 


(1) The relation of congruence of triangles in the plane is reflexive, symmetric 
and transitive. 

(2) The relation of one person weighing within 5 Ibs of another person is reflexive 
and symmetric, but not transitive. The relation is not transitive, because if A, B and 
C are people who weigh 130, 133 and 136 lbs respectively, then A is related to B, 
and B is related to C, but A is not related to C. The relation is reflexive, because any 
person is within 0 lbs of her own weight. The relation is symmetric, because if X and 
Y are people who weigh within 5 lbs of each other, then Y and X weigh within 5 lbs 
of each other. 

(3) The relation < on R is reflexive and transitive, but not symmetric. 

(4) Let C = {1,2,3}, and let P be the relation on C defined by the set P = 
{(2,2), (3,3), (2,3), (3,2)}. Then P is symmetric and transitive, but not reflexive. 

(5) Let B = {x,y,z}, and let E be the relation on B defined by the set E = 
{(x,x), (vy), (z,z), (®,y), (y,z)}. Then E is reflexive, but neither symmetric nor tran- 
sitive. The relation is reflexive, because (x,x), and (y,y) and (z,z) are all in EZ, and 
therefore x E x, and y E y and z E z. The relation is not symmetric, because x E y but 
y Z x. The relation is not transitive, because x E y and y E z, but x £ z. 

(6) The relation of one person being the cousin of another is symmetric, but 
neither reflexive nor transitive. 

(7) The relation < on R is transitive, but neither reflexive nor symmetric. Let 
x,y,z € R. The relation is transitive, because if x,y,z € R, and if x < y and y < z, then 
x < z. The relation is not reflexive, because it is never the case that x < x, for any 
x € R. (Observe that we have much more here than the minimum needed to prove 
that the relation < is not reflexive; it would have sufficed to know that z ¢ z for a 
single z € R.) The relation is not symmetric, because if x,y € R, and x < y, itis never 
the case that y < x. (Again, we have much more than is minimally needed to prove 
that < is not symmetric.) 

(8) The relation of one person being the daughter of another person is neither 
reflexive, symmetric nor transitive. © 


There are standard proof strategies for proving that a relation is reflexive, sym- 
metric or transitive. Let A be a non-empty set, and let R be a relation on A. 

If we wish to prove that R is reflexive, we need to show that for every x € A, 
the condition x R x is true. Hence, a proof of the reflexivity of R typically has the 
following form. 


Proof. Letx EA. 


(argumentation) 


Then x R x. Hence R is reflexive. 


If we wish to prove that R is symmetric, we need to show that x R y implies y Rx 
for every x,y € A. Observe that to prove that R is symmetric, we do not prove that 
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either x R y or y R x is true (in fact they might not be true for some values of x,y € A), 
but only that x R y implies y R x. Hence, a proof of the symmetry of R typically has 
the following form. 


Proof. Let x,y € A. Suppose that x R y. 


(argumentation) 


Then y Rx. Hence R is symmetric. 


If we wish to prove that R is transitive, we need to show that x R y and y Rz 
together imply x R z for every x,y,z € A. Again, observe that we do not prove that 
x Ry and yRzare true, but only that they imply x R z. Hence, a proof of the transitivity 
of R typically has the following form. 


Proof. Let x,y,z € A. Suppose that x R y and y R z. 


(argumentation) 


Then x R z. Hence R is transitive. 


Exercises 


Exercise 5.1.1. For each of the following relations on Z, find the relation classes [3] 
and {—3] and [6]. 


(1) Let S be the relation defined by a S b if and only if a = |b, for all a,b € Z. 
(2) Let D be the relation defined by a D b if and only if ab, for all a,b € Z. 

(3) Let T be the relation defined by a T b if and only if bla, for all a,b € Z. 

(4) Let Q be the relation defined by a Q bif and only if a+b =7, for all a,b € Z. 


Exercise 5.1.2. For each of the following relations on R?, give a geometric descrip- 
tion of the relation classes [(0,0)] and [(3,4)]. 


(1) Let S be the relation defined by (x,y) S (z,w) if and only if y = 3w, for all 
(x,y), (z,w) € R?. 

(2) Let T be the relation defined by (x,y) T (z,w) if and only if x7 + 3y? = 72? + 
w”, for all (x,y), (z,w) € R?. 

(3) Let Z be the relation defined by (x,y) Z (z,w) if and only if x =z or y=w, 
for all (x,y),(z,w) € R?. 


Exercise 5.1.3. Let A = {1,2,3}. Each of the following subsets of A x A defines a 
relation on A. Is each relation reflexive, symmetric and/or transitive? 


(1) M= {(3,3),(2,2),(1,2),(2,1)}. 
(2) WN = {(1,1),(2,2),(3,3);(1,2)}- 
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(3) O= {(1,1), (2,2), (1,2)}- 

(4) P= {(1,1), (2,2), (3,3)}- 

(5) Q={(1,2), (2,1), (1,3), 3,1), (1, 1} 

(6) R= {(1,2), (2,3), (3, 1}. 

(7) T = {(1,1), (1,2), (2,2), (2,3), (3,3), (1,3)} 


Exercise 5.1.4. Is each of the following relations reflexive, symmetric and/or transi- 
tive? 


(1) Let S be the relation on R defined by x S y if and only if x = |y|, for all x,y € R. 

(2) Let P be the set of all people, and let R be the relation on P defined by x Ry 
if and only if x and y were not born in the same city, for all x,y € P. 

(3) Let T be the set of all triangles in the plane, and let G be the relation on T 
defined by s Gt if and only if s has greater area than f, for all triangles s,t € T. 

(4) Let P be the set of all people, and let M be the relation on P defined by x M y 
if and only if x and y have the same mother, for all x,y € P. 

(5) Let P be the set of all people, and let N be the relation on P defined by x N y 
if and only if x and y have the same color hair or the same color eyes, for all 
xyEeP. 


(7) Let T be the relation on Z x Z defined by (x,y) T (z,w) if and only if there 
is a line in R? that contains (x,y) and (z,w) and has slope an integer, for all 
(x,y), (Z,w) €Z XZ. 


Exercise 5.1.5. Let A be a set, and let R be a relation on A. Suppose that R is defined 
by the set R CA x A. Let R’ be the relation on A defined by the set (A x A) —R 


(1) If R reflexive, is R’ necessarily reflexive, necessarily not reflexive or not nec- 
essarily either? 

(2) If R symmetric, is R’ necessarily symmetric, necessarily not symmetric or not 
necessarily either? 

(3) If R transitive, is R’ necessarily transitive, necessarily not transitive or not 
necessarily either? 


Exercise 5.1.6. Let A be a set, and let R be a relation on A. Suppose that R is sym- 
metric and transitive. Find the flaw in the following alleged proof that this relation is 
necessarily reflexive; there must be a flaw by Example 5.1.6 (4). “Let x € A. Choose 
y €A such that x R y. By symmetry know that y R x, and then by transitivity we see 
that x R x. Hence R is reflexive.” 


Exercise 5.1.7. Let A be a set, and think of C as defining a relation on P(A), as 
stated in Example 5.1.2 (5). Is this relation reflexive, symmetric and/or transitive? 


Exercise 5.1.8. Let A be a set, and let R be a relation on A. 


(1) Suppose that R is reflexive. Prove that U,.<4[x] =A. 

(2) Suppose that R is symmetric. Prove that x € [y] if and only if y € [x], for all 
x,y EA. 

(3) Suppose that R is transitive. Prove that if x R y, then [y] C [x], for all x,y € A. 
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Exercise 5.1.9. Let A and B be sets, let R be a relation on A and let f: A— Bbea 
function. The function f respects the relation R if x R y implies f(x) = f(y), for all 
x,y € A. Which of the following functions respects the given relation? 


(1) Let f: R — R be defined by f(x) = x° for all x € R; let S be the relation on 
R defined by x S y if and only if |x| = |y|, for all x,y ER. 

(2) Let g: R > R be defined by g(x) = cosx for all x € R; let W be the relation 
on R defined by x W y if and only if x—y= me for some k € Z, for all x,y € R. 

(3) Let h: R > R be defined by A(x) = |x| for all x € R, where |x| denotes the 
greatest integer less than or equal to x; let T be the relation on R defined by 
xT y if and only if |x—y| < 1, forall x,y ER. 

(4) Let k: R? — R be defined by k((x,y)) = 3x? + 6xy +3,’ for all (x,y) € R?; let 
M be the relation on R? defined by (x,y) M (z,w) if and only if x+y =z+w, 
for all (x,y), (z,w) € R?. 


Exercise 5.1.10. Let A and B be sets, let R be a relation on A and let f: A— Bbea 
function. Suppose that f is injective, and that it respects the relation R, as defined in 
Exercise 5.1.9. What, if anything, can be proved about the relation R? 


Exercise 5.1.11. Let A and B be sets, let R and S be relations on A and B, respectively, 
and let f: A — B be a function. The function f is relation preserving if x R y if and 
only if f(x) S f(y), for all x,y € A. 


(1) Suppose that f is bijective and relation preserving. Prove that f—! is relation 
preserving. 

(2) Suppose that f is surjective and relation preserving. Prove that R is reflexive, 
symmetric or transitive if and only if S is reflexive, symmetric or transitive, 
respectively. 


5.2 Congruence 


In this section we discuss a very important type of relation on the set of integers, 
which will serve to illustrate the general topic discussed in the next section, and is 
also a valuable tool in various parts of mathematics and its applications, for example 
number theory, cryptography and calendars. See [RosO5, Chapters 4 and 5] for fur- 
ther discussion of congruence and its applications, and see [Kob87] for a treatment 
of congruence and cryptography. 

The idea of congruence is based upon the notion of “clock arithmetic,’ a term 
sometimes used in elementary mathematics. (For the reader who has not seen “clock 
arithmetic,” it will be sufficient to have seen a clock). For the sake of uniformity, we 
will make all references to time using the American 12-hour system (ignoring a.m. 
vs. p.m.), as opposed to the 24-hour system used many places around the world, and 
in the U.S. military. 

Suppose that it is 2 o’clock, and you want to know what time it will be in 3 
hours. Clearly the answer is 2 +3 = 5 o’clock. Now suppose that it is 7 o’clock, and 
you want to know what time it will be in 6 hours. A similar calculation would yield 
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7+ 6 = 13 o’clock, but the correct answer would be 1| o’clock, which is found by 
subtracting 12 from 13, because 13 is greater than 12. Similarly, if itis 11 0’ clock and 
you want to know what time it will be after 30 hours, you first compute 11 +30 = 41, 
and you obtain a number from | to 12 by subtracting the number 12 as many times as 
needed from 41 until a number in the | to 12 range is obtained. This method yields 
41 —36 =5 o'clock. 

Let us now drop the “o’clock.” In the previous paragraph, there were two con- 
flicting things we wanted to accomplish: to restrict ourselves to the integers from | to 
12, and to be able to add numbers even when it took us outside of the | to 12 range. 
To resolve this problem, we took any number that was outside the desired range, 
and reduced it by multiples of 12 until we were back in the | to 12 range, where by 
“multiple” we mean in integer multiple. For example, we reduced 41 to 5 by sub- 
tracting 3 times 12. We therefore consider 41 and 5 as equivalent from the point of 
view of clocks (though of course these two numbers are not necessarily equivalent 
from other points of view). In general, two integers are equivalent in this approach if 
they differ by some multiple of 12. For example, we see that 28 and 4 are equivalent 
in this sense, but 17 and 3 are not. 

We used the number 12 in the above discussion because of our familiarity with 
clocks, but, as we state formally in the following definition, the same procedure 
works with any other natural number replacing 12. 


Definition 5.2.1. Let n € N, and let a,b € Z. The number a is congruent to the 
number b modulo n, denoted a = b (mod n), if a—b = kn for some k € Z. A 


Example 5.2.2. We see that 19 = —5 (mod 4), because 19 —(—5) = 24 =6-4; and 
7=7 (mod 3), because 7—7 =0=0-3; and 1342 (mod 9), because 13 —2 = 11 
and 11 is not a multiple of 9. © 


For each n € N, we obtain a relation on Z given by congruence modulo n. The 
following lemma shows that for each n, this relation is reflexive, symmetric and 
transitive, as defined in Section 5.1. 


Lemma 5.2.3. Letn €N, and let a,b,c € Z. 


I. a=a (mod n). 


2. Ifa=b (mod n) then b=a (mod n). 
3. Ifa=b (mod n) and b=c (mod n), thena=c (mod n). 
Proof: 


(1). Observe that a—a=0-n. 


(2). Suppose that a=b (mod n). Then a—b = kn for some k € Z. Hence b—a = 
(—k)n. Because —k € Z, it follows that b =a (mod n). 


(3). Suppose that a=b (mod n) andb=c (mod n). Thena—b=knand b—c= 
jn for some k, j € Z. Adding these two equations we obtain a—c = (k+ j)n. Because 
k+j €Z, it follows thata=c (mod n). 
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We now prove a more substantial result about congruence modulo n. The proof of 
this theorem makes use of an important fact about the integers known as the Division 
Algorithm, which is stated as Theorem A.5 in the Appendix. 


Theorem 5.2.4. Let n € N, and let a € Z. Then there is a unique r € {0,...,n—1} 
such that a=r (mod n). 


Proof. To prove uniqueness, suppose that there are x,y € {0,...,2—1} such that 
a =x (mod n) and a=y (mod n). It follows from Lemma 5.2.3 (2) that x =a 
(mod n), and from Lemma 5.2.3 (3) that x = y (mod 7). That is, we have x— y= pn 
for some p € Z. On the other hand, because x,y € {0,...,n—1}, it follows that 
(n—1) <x—y<n-—1. We deduce that p = 0, and hence that x = y. 

To prove existence, we use the Division Algorithm (Theorem A.5) to deduce that 
there are g,r € Z such that a=ng+r and 0 <r <n. Hence a—r = qn, and therefore 
a=r (mod n). 


We can restate Theorem 5.2.4 without reference to congruence modulo n. 


Corollary 5.2.5. Letn €N, and let a € Z. Then precisely one of the following holds: 
either a=nk for some k € Z, ora=nk-+1 for some k € Z, or a=nk+2 for some 
kKeEZ,..., ora=nk+(n-—1) for some k € Z. 


If we use n = 2 in Corollary 5.2.5, we deduce the following familiar result. 
Corollary 5.2.6. Let a € Z. Then a is even or odd, but not both. 


Another way to think about Theorem 5.2.4 is by using relation classes with re- 
spect to congruence modulo n. Let us examine the case n = 5, where we list a few of 
the relation classes: 


= {,..,—10,—5,0,5,10,...} 


0 

1] ={...,-9,—-4, 1,6,11,...} 
ea og 8 Ret 
Be Lis y= 723,815, 2.1 
4] ={...,-6,-1,4,9,14,...} 
5] = {...,-5,0,5, 10, 15...}. 


We see that the relation classes repeat themselves every five integers. Hence 


[0] = [5] = [10] = 
[1] = [6] = [11] = 
[2] = [7] = [12] = 
[3] = [8] = [13] = 
[4] = [9] = [14] = 
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Although a relation class is defined for every integer, there are in fact only five dis- 
tinct classes. Moreover, these classes are disjoint, and their union is all of Z. The 
analogous result holds for arbitrary n, as stated in the following theorem. 


Theorem 5.2.7. Letn € N. 


I. Let a,b € Z. Ifa=b (mod n), then [a] = [b]. Ifa #b (mod n), then [a] N 
[b] =90. 
2. [OJU[IJU...U[n—1] =Z. 


Proof. 


(1). Suppose that a= b (mod n). Let x € [a]. Then by the definition of relation 
classes we know that a = x (mod n). By Lemma 5.2.3 (2) it follows that b =a 
(mod n), and hence by Lemma 5.2.3 (3) we deduce that b =x (mod n). Therefore 
x € |b], and hence [a] C [b]. A similar argument shows that [b] C [a]. We conclude 
that [a] = [b]. 

Now assume that a # b (mod n). We use proof by contradiction. Suppose that 
[a] 1 [b] 4 ®. Hence there is some y € [a] [b]. Then y € [a] and y € [b], so that a=y 
(mod n) and b = y (mod n). By Lemma 5.2.3 (2) we see that y= b (mod n), and 
by Lemma 5.2.3 (3) it follows that a= b (mod n), which is a contradiction. We 
conclude that [a] 9 [b] = @. 


(2). By definition [a] C Z for all a € Z, and therefore [0] U...U[n— 1] C Z. Let 
x € Z. By Theorem 5.2.4 there is a unique r € {0,...,2—1} such that x=r (mod n). 
It follows from Lemma 5.2.3 (2) that r= x (mod n). Hence x € [r]. Because r € 
{0,...,2—1}, it follows that x € [0] U...U|n— 1]. Therefore Z C [0] U...U [n= 1]. 
We conclude that [0]U...U[n— 1] = Z. 


Theorem 5.2.7 shows that relation classes for congruence modulo n are much 
better behaved than relation classes for arbitrary relations, as seen in Example 5.1.4. 
We are now ready for the following definition. 


Definition 5.2.8. Let n € N. The set of integers modulo n, denoted Z,, is the set 
defined by Z,, = {[0], {1],..., [1 —1]}, where the relation classes are for congruence 
modulo n. A 


The set Z, is also denoted Z/nZ in some texts, for reasons that will become 
apparent if the reader learns about group theory. 


Example 5.2.9. The integers modulo 12 is the set Z). = {[0],[1],...,[11]}. This 
set has 12 elements, each of which is itself a set (namely, a relation class), but 
which is viewed here as a single element in the set Z)2. The relation classes in 
Z12 could each be described differently. For example, we see that [0] = [12], and 
so Zio = {[12], [1],...,[11]}, which is what we see on the face of a clock. For math- 
ematical purposes it is more convenient to write [0] rather than [12], and so we will 
continue to write Z12 as we did originally; it would also be nice to have the 12 on 
clocks replaced with 0, but historical practice holds sway over mathematics in this 
situation. There are, of course, many other ways to rewrite the elements of Z12, for 


5.2 Congruence 181 


example Z)2 = {[—36], [25], [—10],...,[131]}, and so it would in principle be possi- 


ble to replace the number on a clock with —36, 25, —10,...,131, though presumably 
only mathematicians would find that amusing. ?) 


For each n € N, the set Z,, has n elements. Of course, for each n € N, there 
are many sets with n elements, but what makes Z, particularly useful is that there 
is a natural way to define addition and multiplication on it, as seen in the follow- 
ing definition. Addition and multiplication are examples of binary operations, which 
produce one output for every pair of inputs. Binary operations will be discussed in 
Section 7.1, but for now it is sufficient to think of addition and multiplication on Z,, 
simply as analogs of the familiar addition and multiplication of real numbers. 


Definition 5.2.10. Let n € N. Let + and - be the binary operations on Z,, defined by 
[a] + [b] = [a + b] and [a] - [b] = [ad] for all [a], [b] € Zp. A 


As reasonable as Definition 5.2.10 seems, there is a potential problem. Let n € 
N, and let [a], [b], [c],[d] € Zn. Suppose that [a] = [c] and [b] = [d]. Do [a+b] = 
[c-+d] and [ab] = [cd] necessarily hold? If not, then we could not say that [a] + 
[b] = [c] + [d] and [a] - [b] = [c]- [d], and then + and - would not be well-defined 
binary operations on Z,, because [a] + [b] and [a] -[b] would depend not just on 
the relation classes [a] and [b], but on the particular choice of a and b. This sort 
of verification is often needed whenever something is defined for relation classes 
by using representative elements of the classes. Neglecting such verification is a 
common mistake. Fortunately, everything works as desired in the present case, which 
we verify using the following lemma. 


Lemma 5.2.11. Let n € N, and let a,b,c,d € Z. Suppose that a=c (mod n) and 
b=d (mod n). Thena+b=c+d (mod n) and ab=cd (mod n). 


Proof. There exist k, j € Z such that a—c =kn and b—d = jn. Thena=c+kn and 
b=d-+ jn, and therefore 


a+b=(c+kn)+(d+ jn) =c+d+(k+ Jn, 
ab = (c+kn)(d+ jn) = cd + (cj+dk+kjn)n. 


The desired result now follows. 


From Lemma 5.2.11, together with Theorem 5.2.7 (1), we deduce the following 
corollary, which we state without proof. This corollary tells us that + and - as given 
in Definition 5.2.10 are indeed well-defined for each Z,. 


Corollary 5.2.12. Let n € N, and let [a], {b], |c], |d] € Zn. Suppose that (a\ = [c] and 
[b] = [d]. Then [a+b] = [c +d] and [ab| = [cd]. 


It is important to observe that the binary operations + and - are different in each 
set Zn. For example, in Z7 we see that [6] + [4] = [10] = [3], whereas in Zo we see 
that [6] + [4] = [10] = [1]. 

One nice way of working with the + and - on Z, is to make operation tables, 
which are analogous to the multiplication tables often used in elementary school. 
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(See Section 7.1 for more discussion of such operation tables.) Consider the follow- 
ing tables for Ze. 


+|[0} [1] [2] [3] [4] [5 - |{0] (1] [2] [3] [4] [5 
[Oy [O} [1] [2] (3) (41 15 [O]|[O} {0} [0] [0] [0] [0 
(1]I{1] [2] [3] [41 [5] [o (1]}[0} [1] [2] [3] [4] [5 
(2]|{2] [3] [4] [5] [oO] (1 [2]|[0] [2] [4] fo} [2] [4 
(3]|[3] [4] [5] fo} [1] (2 [3]|[0] [3] [0] [3] [0] 3 
[4]|{4] [5] 0] [1] (21 3 [4]|[0] [4] [2] [ol [4] (2 
(5]|(5] [0] [1] [2] [3] [4 [5]|[0} [5] [4] [3] [2] fa 


The table for + has a very nice pattern, in which the entries are constant on each line 
of slope 1. Moreover, every element of Ze appears precisely once in each row and 
in each column of the table. These same properties hold in the table for + for every 
Zn. The table for - for Ze is not as well behaved. For example, not every element 
of Ze appears in each row and in each column, though some rows and columns do 
have all elements. However, the tables for - for the other Z,, do not all behave the 
same as for Ze. The issue has to do with prime numbers, and whether or not various 
numbers have common factors. A thorough study of these questions makes use of 
some number theoretic issues. See [Fra03, Section 20] for details. 

A related question is whether equations of the form [a] - x = [b] can be solved 
in any Z,. The analogous equation involving real numbers, that is, an equation of 
the form ax = b, always has a unique solution whenever a # 0. The situation in Z,, 
is more complicated. Consider the equation [4] +x = [3]. In Z1, there is a unique 
solution, which is x = [9], as can be verified simply by trying each element of Z1, as 
a possible candidate for x. In Z12 the same equation has no solution, as can again be 
verified by trying each element of Z2. The equation [3] -x = [0] has three solutions 
in Ze, which are x = [0], [2], [4], as can be seen using the operation table for - for Ze. 
We therefore see that in Ze it is possible to have two non-zero elements such that 
their product is [0], in contrast to the situation for multiplication of real numbers. See 
[Fra03, Section 20] for further discussion of this type of equation in Z». 

Our final comment about Z,, takes us back to our initial discussion of clocks, 
where we took the number 41 (which was the result of starting at 11 o’clock and 
adding 30 hours), and we then subtracted the number 12 as many times as needed 
from 41 until a number in the | to 12 range was obtained; in this case we obtained 
the number 5. There are, of course, infinitely many numbers in Z that, when treated 
in this same way, will yield the number 5. Rather than thinking of the number 5 here 
as an integer, it is more correct to think of it as an element of Z,2. That is, we think 
of taking infinitely many elements of Z, and we send them all to a single element 
in Z12. Of course, there is nothing special about the number 5, and there is nothing 
special about working modulo 12. We now use functions to formalize this process. 


Definition 5.2.13. Let n € N. The canonical map for congruence modulo n is the 
function y: Z — Z,, defined by y(a) = [a] for all a € Z. A 
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Observe that there is a distinct function y for each n € N, but to avoid unneces- 
sarily cumbersome notation (such as ¥,,), we will assume that the number 7 is always 
known from the context. 

The canonical map y: Z — Z,, is a special case of a more general type of canon- 
ical map that will be seen in Definition 5.3.8. 

We now see two simple results about the canonical map that are examples of 
more general, though rather different, phenomena we will see subsequently. 


Lemma 5.2.14. Letn € N, let B be a set and let f: Z— B be a function. Suppose 
that if a,b € Z and a= b (mod n), then f(a) = f(b). Then there exists a unique 
function g: Zn — B such that f = gory. 


Proof. Left to the reader in Exercise 5.2.10. 


Using the terminology of Exercise 5.1.9, we say that the function f in Lemma 5.2.14 
respects congruence modulo n. The condition f = go y in Lemma 5.2.14 is repre- 
sented by the following commutative diagram (as discussed in Section 4.3). 


Z 
| 
Zn 


In contrast to Lemma 5.2.14, which can be generalized to all equivalence re- 
lations, as seen in Lemma 5.3.9, the following property of the canonical map for 
congruence modulo n is not applicable to most equivalence relations, though it can 
be generalized in a very different way, as discussed in Section 7.3. 


ee 
& 


Lemma 5.2.15. Letn € N, and let a,b € Z. Then y(a+b) = y(a)+ y(b) and y(ab) = 
(a): (0). 


Proof. Left to the reader in Exercise 5.2.11. 


Exercises 


Exercise 5.2.1. Which of the following are true and which are false? 


(1) 13=5 (mod 2). (4) 3=28 (mod 5). 
(2) 21=7 (mod 5). (5) 23 =23 (mod 7). 
(3) 7=0 (mod 2). 


Exercise 5.2.2. Solve each of the following equations in the given set Z,,. (In some 
cases there is no solution.) 
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(1) [5] +x= [I] in Zo. (4) x- [6] = [2] in Zio. 
(2) [2]-x = [7] in Zi. (5) [3] -x+ [4] = [1] in Zs. 
(3) x- [6] = [4] in Zs. 


Exercise 5.2.3. Find n € N and a,b € Z such that a” = b* (mod n) but a# bd 
(mod n). 


Exercise 5.2.4. Let n,q € N, and let a,b € Z. Suppose that a = b (mod n), and that 
q|n. Prove that a= b (mod q). 


Exercise 5.2.5. Let n € N, and let a,b € Z. Suppose that a=b (mod n). Prove that 
nia if and only if n|b. 


Exercise 5.2.6. Prove or give a counterexample for each of the following proposed 
cancellation laws. 


(1) Let n EN, and let a,b,c € Z. Then a+c =b+c (mod n) implies a = b 
(mod n). 

(2) Letn EN, and let a,b,c € Z— {0}. Suppose that c is not a multiple of n. Then 
ac = bc (mod n) implies a = b (mod n). 


Exercise 5.2.7. For this exercise we use factorials. If m € N, then the factorial of m is 
m! = m(m—1)(m—2)---2-1. Itis assumed that the reader is familiar with factorials 
informally; a formal definition of this concept is given in Example 6.4.4 (1), where 
the “---” are avoided. 

Let n € N. Suppose that n > 1, and that (n — 1)! = —1 (mod n). Prove that n is a 
prime number. The converse to this result, known as Wilson’s Theorem, is also true, 
but has a slightly lengthier proof; see [AR89, Section 3.5] or [Ros05, Section 6.1] 
for details. 


Exercise 5.2.8. Let n € Z. Prove that n> =n (mod 6). 


Exercise 5.2.9. Let n € Z. Prove a precisely one of the following is true: n? = 0 
(mod 16), or n? (mod 8), orn? =4 (mod 16). 


Exercise 5.2.10. [Used in Lemma 5.2.14.] Prove Lemma 5.2.14. 
Exercise 5.2.11. [Used in Lemma 5.2.15.] Prove Lemma 5.2.15. 


Exercise 5.2.12. [Used in Exercise 5.2.13 and Section 8.3.] Is there a relation between 
a natural number and the sum of its digits? We now have the tools to answer this 
question. Let x € N. We can write x in decimal notation as a,dm—,-+--a2a,, where 
aj iS an integer such that 0 < a; < 9 for all i € {1,...,m}. That notation means 
x=" ,a;10'. The sum of the digits of x is therefore )°” , a;. Prove that 


m 


Lai 10~ t= Yas ( (mod 9). 


You may use the fact that the statement of Lemma 5.2.11 can be extended to sums 
and products of any finite number of integers. 
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Exercise 5.2.13. This exercise continues Exercise 5.2.12. For each part of this exer- 

cise, itis acceptable to use informal arguments, because rigorous proofs require proof 

by induction, which we have not yet seen (we will see it in detail in Section 6.3). 
Leta,beEN. 


(1) Let X(a) denote the sum of the digits of a. For any m € N, let 2” (a) denote 
Z(Z(L(---L(a)---))), with £ repeated m times. Let £°(a) = a. Prove that 
there is some r € NU {0} such that Y”(a) has a single digit. 

(2) Let M(a) denote the smallest n € N such that 2” (qa) is a single digit. (It makes 
sense intuitively that there is such a smallest natural number; formally we 
make use of the Well-Ordering Principle (Theorem 6.2.5).) If a has only one 
digit, let M(a) = 0. Does M(a+b) = M(a) + M(b) always hold? Give a 
proof or a counterexample. Does M(a+b) > M(a)+M(b) always hold? Give 
a proof or a counterexample. Does M(a +b) < M(a) +M(b) always hold? 
Give a proof or a counterexample. 

(3) Let £(a) be an abbreviation for 2”) (a); that is, the number ¥(a) is the 
result of repeatedly adding the digits of the number a until a single digit 
remains. (This process is used in gematria, a method employed in Jewish 
mysticism, as well as in similar constructions in Greek and Arab traditions; 
see [Ifr85, Chapter 21] for details.) Does 3(a +b) = Z(a) + X(b) always 
hold? Give a proof or a counterexample. Does ¥ (ab) = X(a)-Z(b) always 
hold? Give a proof or a counterexample. 

(4) Prove that ¥(a+b) = X(X(a)+Z(b)) and Y(ab) = X(X(a)-L(b)). 
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In Lemma 5.2.3 we saw that for each n € N, congruence modulo n satisfied the three 
properties of reflexivity, symmetry and transitivity. It turns out that many important 
relations found throughout mathematics satisfy these three properties. 


Definition 5.3.1. Let A be a set, and let ~ be a relation on A. The relation ~ is an 
equivalence relation if it is reflexive, symmetric and transitive. A 


Example 5.3.2. Some examples of equivalence relations are equality on the set R; 
congruence modulo n on Z for any n € N; similarity of triangles on the set of all 
triangles in the plane; being the same age on the set of all people. ©) 


Because we can form relation classes for arbitrary relations, we can in partic- 
ular form them for equivalence relations. Because relation classes for equivalence 
relations turn out to behave particularly nicely, and are of great importance, we give 
them a special name. 


Definition 5.3.3. Let A be a non-empty set, and let ~ be an equivalence relation on 
A. The relation classes of A with respect to ~ are called equivalence classes. A 


For the rest of this section, in order to avoid trivial cases, we will restrict our 
attention to non-empty sets. We start with the following theorem, which generalizes 
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Theorem 5.2.7, and which shows that equivalence classes are much better behaved 
than arbitrary relation classes, as seen in Example 5.1.4. The proof of Part (1) of the 
following proposition, which is left to the reader as an exercise, is simply a rewriting 
of the proof of Theorem 5.2.7 (1) in the more general setting of equivalence classes. 


Theorem 5.3.4. Let A be a non-empty set, and let ~ be an equivalence relation on 
A. 


I. Letx,y € A. Ifx ~y, then [x] = [y]. [fx y, then [x] N[y] = 0. 
2s Uxea [x] =A. 


Proof. We will prove Part (2), leaving the remaining part to the reader in Exer- 
cise 5.3.6. 


(2). By definition [x] C A for all x € A, and hence U,<,4 [x] C A. Now let g € A. 
By reflexivity we see that g ~ q, and therefore g € [g] C Uye,4|x]. Hence A C Uxe, [2]. 
We conclude that U,<4 [x] =A. 


A careful look at the proofs of both parts of Theorem 5.3.4 reveals that the proof 
of Part (1) uses the symmetry and transitivity of the relation, and the proof of Part (2) 
uses reflexivity; we therefore see precisely where the three properties in the definition 
of equivalence relation are used in this proof. 

There is a redundancy in the expression U,<4 [x] in Theorem 5.3.4 (2), because 
some of the sets [x] might be equal to one another. For example, Theorem 5.3.4 (2) 
applied to congruence modulo n for a given n € N would say that ---U[—1] U [0] U 
[1] U[2]U--- = Z, which is not nearly as strong as the statement in Theorem 5.2.7 (2), 
which says that [0] U[1]U...U[a— 1] = Z, and which has no redundancy. The reason 
that the statement in Theorem 5.2.7 (2) is stronger is that in the particular case of 
congruence modulo n we made use of the Division Algorithm (Theorem A.5 in the 
Appendix), which has no analog for arbitrary equivalence relations. 

The following corollary is derived immediately from Theorem 5.3.4 (1). 


Corollary 5.3.5. Let A be a non-empty set, let ~ be an equivalence relation on A 
and let x,y € A. Then |x] = |y] ifand only ifx~ y. 


Recall that in Section 5.2, for each n € N we formed the set Z,, of all equivalence 
classes of Z with respect to congruence modulo n. We now turn to the analog of this 
construction for arbitrary equivalence relations. 


Definition 5.3.6. Let A be a non-empty set, and let ~ be an equivalence relation 
on A. The quotient set of A with respect to ~, denoted A/~, is the set defined by 
A/~ = {[x] |x € A}. rs 


The set A/~ in Definition 5.3.6 is the set of all equivalence classes of A with 
respect to ~. As in all sets, each element of the set A/~ occurs only once in the set, 
even if it might not appear that way from the expression {{x] |x € A}. That is, even 
though this expression might make it appear as if there is one element of the form [x] 
in A/~ for each x € A, that is not the case in general, because it will often happen 
that [x] = [y] for some distinct x,y € A, that is, when x ~ y by Corollary 5.3.5. Looked 
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at another way, each equivalence class is named after each of its elements, so that a 
single equivalence class may have many names, but it is still a single set, and a single 
element of A/~. 


Example 5.3.7. Let P be the set of all people, and let ~ be the relation on P defined 
by x ~ y if and only if x and y are the same age (in years). If person x is 19 years 
old, then the equivalence class of x is the set of all 19-year-olds. Each element of the 
quotient set P/~ is itself a set, where there is one such set consisting of all 1-year- 
olds, another consisting of all 2-year-olds, and so on. Although there are billions of 
people in P, there are fewer than 125 elements in P/~, because no currently living 
person has reached the age of 125. o) 


In the following definition, which generalizes Definition 5.2.13, we use functions 
to relate a set and its quotient set. 


Definition 5.3.8. Let A be a non-empty set, and let ~ be an equivalence relation on 
A. The canonical map for A and ~ is the function y: A > A/~ defined by y(x) = [x] 
for all x € A. A 


We now have a generalization of Lemma 5.2.14. 


Lemma 5.3.9. Let A and B be non-empty sets, let ~ be an equivalence relation on 
A and let f: A — B be a function. Suppose that a ~ y implies f(x) = f(y), for all 
x,y € A. Then there exists a unique function g: A/~ — B such that f = go y, where 
y: A A/~ is the canonical map. 


Proof. Left to the reader in Exercise 5.3.7. 


Using the terminology of Exercise 5.1.9, we say that the function f in Lemma 5.3.9 
respects ~. The condition f = go yin Lemma 5.2.14 is represented by the following 
commutative diagram (as discussed in Section 4.3). 


f 


A ——— B 


| 4 
A/~ 
Suppose that we have a quotient set A/~. We see from Theorem 5.3.4 that any 
two distinct equivalence classes in A/~ are disjoint, and that the union of all the 
equivalence classes is the original set A. We can therefore think of A/~ as the result 


of breaking up the set A into disjoint subsets. The following definition generalizes 
this notion of breaking up a set into disjoint subsets. 


Definition 5.3.10. Let A be a non-empty set. A partition of A is a family D of 
non-empty subsets of A such that 


(a) if ,}Q © DandPFQ, then PNO=9,; 
(b) UpepP =A. A 
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Definition 5.3.10 (a) can be rephrased more concisely by saying that D is pair- 
wise disjoint, using the terminology of Definition 3.5.1, though here we use the non- 
indexed version of the definition. A schematic representation of a partition of a set is 
seen in Figure 5.3.1. Another way of looking at partitions is to observe that if £ is a 
family of subsets of a set A, then £ is a partition of A if and only if for each x € A, 
there is one and only one P € € such that x € P. 


pe yay 


Fig. 5.3.1. 


It is important to distinguish between the mathematical usage of the word “parti- 
tion” and the colloquial usage of the word. In the colloquial usage, a partition that one 
places in a room is something that divides the room into smaller parts; in the math- 
ematical usage, it is the collection of those smaller parts of the room that forms the 
partition of the room, not the dividers between the smaller parts. In Figure 5.3.1, the 
partition of the set S has five elements, namely, the sets D;,...,Ds5, each of which 
contains all the elements in the region of the plane with the appropriate label; the 
curved lines that separate these five regions do not have a name in mathematical us- 
age (and indeed, they exist only in pictures, not in actual sets). Observe also that the 
word “partition” refers only to the family of sets, not to the elements of the family. In 
Figure 5.3.1, the partition is the family D = {Dj,...,Ds}; the elements Dj,...,Ds 
of D are not themselves called “partitions,” but rather “elements of the partition.” 


Example 5.3.11. 


(1) Let E denote the set of even integers, and let O denote the set of odd integers. 
Then D = {E, O} is a partition of Z. 

(2) Let C= {|[n,n+1)},,c7- Then C is a partition of R. 

(3) Let G = {(n—1,n+1)},<7. Then G is not a partition of R, because it is 
not pairwise disjoint. For example, we observe that (1 —1,1+1)N(2—1,2+1)= 
(1,2). 


Using the terminology of partitions, we can now state the following immediate 
corollary to Theorem 5.3.4. 


Corollary 5.3.12. Let A be a non-empty set, and let ~ be an equivalence relation on 
A. Then A/~ is a partition of A. 


We see from Corollary 5.3.12 that there is a connection between equivalence 
relations on a set and partitions of the set. This connection can be made more precise 
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using bijective functions. To state our result, we will need the following definition, 
which takes us to one higher level of abstraction than we have seen until now in our 
discussion of relations. 


Definition 5.3.13. Let A be a non-empty set. Let £(A) denote the set of all equiva- 
lence relations on A. Let Z denote the set of all partitions of A. A 


For a given set A, it is important to keep in mind what the elements of £(A) and 
Ta are. Each element of £(A) is an equivalence relation on A, which formally is a 
subset of A x A that satisfies certain conditions. Each element of Jj is a partition of 
A, which is a family of subsets of A that satisfy certain conditions. 


Example 5.3.14. Let A = {1,2,3}. Then %4 = {D,D2,...,Ds}, where 


Di = {{1}, {2}, {3} }, Da = {{2,3}, {1H}, 
Do = {{1,2}, {3}}, Ds = {{1,2,3}}. 
D3 = {{1,3},1421} 


Also, we see that £(A) = {R1,R2,...,Rs}, where these relations are defined by the 
sets 


Ri = {(1,1), (2,2), (3,3)}, 

Ro = {(1,1), (2,2), (3,3), (1,2), (2,1) }, 

R3 = {(1,1), (2,2), (3,3), (1,3), (3,1)}, 

Ra = {(1,1), (2,2), (3,3), (2,3), (3,2)}, 

Rs = {(1,1), (2,2), (3,3), (1,2), (2,1), (1,3), (3,1), (2,3), (3,2)}- 


It is straightforward to verify that each of the relations R; listed above is an equiva- 
lence relation on A. © 


Is it a coincidence that the sets £(A) and J in Example 5.3.14 have the same 
number of elements? In fact, we will see shortly that for any set A, whether finite 
or infinite, there is a correspondence between the equivalence relations on A and the 
partitions of A. To state this correspondence precisely, we start by defining, for each 
non-empty set A, a function from £(A) to J, and a function in the other direction. It 
is not entirely obvious that these functions make sense, but they do indeed work, as 
noted in the lemma following the definition. 


Definition 5.3.15. Let A be a non-empty set. Let ®: £(A) — % be defined as fol- 
lows. If ~ is an equivalence relation on A, let ®(~) be the family of sets A/~. Let 
W: I, — E£(A) be defined as follows. If D is a partition of A, let Y¥(D) be the relation 
on A defined by x ¥(®) y if and only if there is some P € D such that x,y € P, for all 
xX,yEA. A 


Observe that there is a distinct function ® and a distinct function Y for each 
non-empty set A, but to avoid unnecessarily cumbersome notation (such as ®, and 
Y,), we will assume that the set A is always known from the context. 
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Lemma 5.3.16. Let A be a non-empty set. The functions ® and ¥ are well-defined. 


Proof. To prove the lemma, we need to show the following two things: (1) For any 
equivalence relation ~ on A, the family of sets ®(~) is a partition of A; and (2) for 
any partition D of A, the relation ¥%() is an equivalence relation on A. The first of 
these claims follows immediately from the definition of ® and Corollary 5.3.12. The 
second claim is left to the reader in Exercise 5.3.11. 


Example 5.3.17. 


(1) Let ~ be the relation on R? defined by (x,y) ~ (z,w) if and only if y—x = 
w—z, for all (x,y), (z,w) € R?. It can be verified that ~ is an equivalence relation. We 
want to describe the partition ®(~) of R?. Let (x,y) € R?. Then [(x,y)] = {(z,w) € 
R? |w—z=y—x}. Let c = y—x. Then [(x,y)] = {(z,w) € R* | w=z+c}, which is 
just a line in R* with slope 1 and y-intercept c. Hence @(~) is the collection of all 
lines in R? with slope 1. 

(2) Let C be the partition of R given in Example 5.3.11 (2). We want to describe 
the equivalence relation ¥%(C). For convenience let = ¥(C). Suppose that x,y € R. 
Then x ~ y if and only if there is some n € Z such that x,y € [n,n + 1). Using the 
notation |x| to denote the greatest integer less than or equal to x, we see that x + y if 
and only if |x| = [y|. 

(3) Let D,,Dz,..., D5 be the partitions and R,,R2,...,Rs5 be the relations given 
in Example 5.3.14. It can be verified that @(R;) = D; and Y(D;) =R; for all i € 
{1,...,5}; details are left to the reader. © 


In Example 5.3.17 (3), we see that ® and W are inverses of each other. Quite 
remarkably, the following theorem says that the same result holds for any non-empty 
set. Consequently, we have a complete picture of the connection between equivalence 
relations and partitions for a given set A: there is a bijective function from the set of 
equivalence relations on A and the set of partitions of A. That is, to each equivalence 
relation on A there corresponds a unique partition of A, and vice versa. Moreover, not 
only do we know in principle that there is such a correspondence, but, even better, 
we have an explicit description of this correspondence, namely, the functions ® and 
i ae 


Theorem 5.3.18. Let A be anon-empty set. Then the functions ® and © are inverses 
of each other, and hence both are bijective. 


Proof. We need to show that 
Yo@P=I1gi4) and SoP=1y. 


First, we prove that Yo ® = |g:4). Let ~ € £(A) be an equivalence relation 
on A. Let = = ¥(@(~)). We will show that ~ = ~, and it will then follow that 
‘Yo & = 1, 4). For convenience let D = O(~), so that ~ = ¥(D). 

Let x,y € A. Suppose that x © y. Then by the definition of Y there is some D € D 
such that x,y € D. By the definition of ©, we know that D is an equivalence class 
of ~, so that D = [q| for some g € A. Then g ~ x and q ~ y, and by the symmetry 
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and transitivity of ~ it follows that x ~ y. Now suppose that x ~ y. Then y € [x]. By 
the reflexivity of ~, we know that x € [x]. The definition of & implies that [x] € D. 
Hence, by the definition of Y, it follows that x ~ y. Therefore x ~ y if and only if 
x ~ y. We conclude that + = ~. 

Second, we prove that BoP = Iq,. Let D€ % be a partition of A. Let F = 
@(¥(D)). We will show that ¥ = D, and it will then follow that 6o ¥ = 14,. For 
convenience let = = Y(D), so that F = B(=). 

Let B € ¥. Then by the definition of ® we know that B is an equivalence class of 
<=, so that B = {z] for some z € A. Because 8 is a partition of A, then there is a unique 
P € Bsuch z € P. Let w € A. Then by the definition of Y we see that z = w if and 
only if w € P. It follows that w € [z] if and only if w € P, and hence P = [z]. Hence 
B=|z] =P € 8. Therefore F C 8B. 

Let C € B. Let y EC. As before, it follows from the definition of ¥ that C = [y]. 
Therefore by the definition of ® we see that C € ®(+) = F. Hence BC F. We 
conclude that ¥ = B. 


Exercises 


Exercise 5.3.1. Which of the following relations is an equivalence relation? 


(1) Let M be the relation on R defined by x M y if and only if x — y is an integer, 
for all x,y ER. 

(2) Let S be the relation on R defined by x S y if and only if x = |y|, for all x,y € R. 

(3) Let T be the relation on R defined by x T y if and only if sinx = siny, for all 
x,yER. 

(4) Let P be the set of all people, and let Z be the relation on P defined by x Z y 
if and only if x and y are first cousins, for all x,y € P. 

(5) Let P be the set of all people, and let R be the relation on P defined by x R y 
if and only if x and y have the same maternal grandmother, for all x,y € P. 

(6) Let L be the set of all lines in the plane, and let W be the relation on L defined 
by a W B if and only if @ and f are parallel, for all a, B € L. 


Exercise 5.3.2. For each of the following equivalence relations on R, find the equiv- 
alence classes [0] and [3]. 


(1) Let R be the relation defined by a R b if and only if |a| = |b], for alla,b € R. 

(2) Let S be the relation defined by a S bif and only if sina = sind, for alla,b ER. 

(3) Let T be the relation defined by a T b if and only if there is some n € Z such 
that a = 2”b, for alla,bE N. 


Exercise 5.3.3. For each of the following equivalence relations on R?, give a geo- 
metric description of the equivalence classes [(0,0)] and [(3,4)]. 


(1) Let Q be the relation defined by (x,y) Q (z, w) if and only if x? +-y? = 77 +w’, 
for all (x,y), (z,w) € R?. 

(2) Let U be the relation defined by (x,y) U (z,w) if and only if |x| + |y| = |z|+ 
|w|, for all (x,y), (z,w) € R?. 
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(3) Let V be the relation defined by (x,y) V (z,w) if and only if max{|x|, |y|} = 
max {[z|, hw], for all (x,y), (z,w) € R 


Exercise 5.3.4. Let A and B be sets, and let f: A — B be a function. Let ~ be the 
relation on A defined by x ~ y if and only if f(x) = f(y), for all x,y EA. 


(1) Prove that ~ is an equivalence relation. 
(2) What can be proved about the equivalence classes of ~? Does the answer 
depend upon whether / is injective and/or surjective? 


Exercise 5.3.5. Let A be a set, and let = be a relation on A. Prove that = is an 
equivalence relation if and only if the following two conditions hold. 


(1) x =x, forallx EA. 
(2) x <y and y Xz implies z = x, for all x,y,z EA. 


Exercise 5.3.6. [Used in Theorem 5.3.4.] Prove Theorem 5.3.4 (1). 
Exercise 5.3.7. [Used in Lemma 5.3.9.] Prove Lemma 5.3.9. 


Exercise 5.3.8. Which of the following families of subsets of R are partitions of 


[0, 00)? 
(1) #={{n—-1,n)} ,en- (4) 1={[n—1,n+1)} en: 
(2) G= {x- 1,)} xe[0,00)+ (5) J= {[0,7)} nen: 
(3) F = (1x} }re(0,0): (©) K= {[2"-'—1,2"—1) }ren: 


Exercise 5.3.9. For each of the following equivalence relations, describe the corre- 
sponding partition. Your description of each partition should have no redundancy, 
and should not refer to the name of the relation. 


(1) Let P be the set of all people, and let = be the relation on P defined by x x y 
if and only if x and y have the same mother, for all x,y € P. 

(2) Let ~ be the relation on R — {0} defined by x ~ y if and only if xy > 0, for 
all x,y € R— {0}. 

(3) Let ~ be the relation on R? defined by (x,y) = (z,w) if and only if (x—1)? + 
y = (z—1)? + w’, for all (x,y), (z,w) € R?. 

(4) Let L be the set of all lines in IR2, and let = be the relation on L defined by 
1, = ly if and only if J; is parallel to / or is equal to Jy, for all 1} ,/5 € L. 


Exercise 5.3.10. For each of the following partitions, describe the corresponding 
equivalence relation. Your description of each equivalence relation should not refer 
to the name of the partition. 


(1) Let £ be the partition of A = {1,2,3,4,5} defined by £ = {{1,5}, {2,3,4}}. 

(2) Let Z be the partition of R defined by Z = {T,},cR, where T, = {x,—x} for 
allx eR. 

(3) Let D be the partition of R? consisting of all circles in R* centered at the 
origin (the origin is considered a “degenerate” circle). 

(4) Let W be the partition of R defined by W = {[n,n+2) | nis an even integer}. 
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Exercise 5.3.11. [Used in Lemma 5.3.16.] Prove Item (2) in the proof of Lemma 5.3.16. 


Exercise 5.3.12. Let X and Y be sets, and let h: X — Y be a function. Let = be the 
relation on X defined by s = ¢ if and only if A(s) = A(t), for all s,t € X. 


(1) Prove that = is an equivalence relation on X. 

(2) Let y: X — X/x be the canonical map. Let j: h(X) — Y be the inclusion 
map. Prove that there is a unique bijective function h: X /=< — h(X) such that 
h= johoy. This last condition is represented by the following commutative 
diagram (as discussed in Section ioe 


xX ps Y 
X/=x —~—— h(x) 
h 
Observe that y is surjective (because = is reflexive), that his bijective and 


that j is injective. Hence any function can be written as a composition of a 
surjective function, a bijective function and an injective function. 


Exercise 5.3.13. [Used in Exercise 5.3.14.] Let A be a non-empty set. Let (A) denote 
the set of all relations on A, and let 54 denote the set of all families of subsets of A. 


(1) Clearly £(A) C R(A) and % C S4. Are these inclusions proper? 

(2) Express the sets R(A) and 5S, in terms of products of sets and power sets. 

(3) Let A = {1,2}. What are R(A) and 54? 

(4) Suppose that A is a finite set. Express |R(A)| and |s4| in terms of |A]. Do 
R(A) and S4 have the same number of elements? Use Example 3.2.9 (2) and 
Example 3.3.10 (3). 


Exercise 5.3.14. This exercise makes use of the definitions given at the start of 
Exercise 5.3.13. We generalize the functions @ and ¥ given in Definition 5.3.15 as 
follows. Let A be a non-empty set. Let B: R(A) — Sy, be defined as follows. If < is 
a relation on A, let B(x) be the family of all relation classes of A with respect to «. 
Let ¥: 54 — ®(A) be defined as follows. If D is a family of subsets of A, let ¥(D) 
be the relation on A defined by x ¥%(D) y if and only if there is some D € D such that 
x,y € D, for all x,y € A. (There is a distinct function ® and a distinct function ¥ for 
each non-empty set A, but we will assume that the set A is always known from the 
context.) 


(1) Find a set B and an element D € Sg such that Y(D) is not reflexive. Find a 
set C and an element £ € Sc such that Y(£) is not transitive. 

(2) Suppose that A is finite and has at least two elements. Prove that each of @ 
and ¥’ is neither injective nor surjective. Is it necessary to restrict our attention 
to sets with at least two elements? 
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(3) Suppose that A has at least two elements. Describe the images of the functions 
® and Y, and prove your results. For ® restrict your attention to the case 
where A is finite. 


6 


Finite Sets and Infinite Sets 


These are among the marvels that surpass the bounds of our imagination, and 
that must warn us how gravely one errs in trying to reason about infinites by 
using the same attributes that we apply to finites. 

— Galileo Galilei (1564-1642) 


6.1 Introduction 


Infinite sets appear to behave more strangely than finite ones, at least from our per- 
spective as human beings, whose daily experience is of the finite. The difficulty of 
dealing with infinite sets was raised by the ancient Greek Zeno in his four “para- 
doxes”; see [Ang94, Chapter 8] or [Boy91, pp. 74-76] for details. From a modern 
perspective Zeno’s paradoxes are not paradoxes at all, and can be resolved using 
tools from real analysis, developed long after Zeno. However, these paradoxes had a 
historical impact on the study of the infinite, and they indicate how much trickier it 
is to understand the infinite than the finite. 

In order to have a better understanding of sets, and in order to develop tools 
that are very important in a variety of branches of mathematics, we need to under- 
stand how to compare “sizes” of sets, and in particular to understand the difference 
between finite sets and infinite sets, and between countably infinite sets and uncount- 
able sets (both of which are types of infinite sets that will be defined in this chapter). 
In Section 6.5 we discuss the general notion of the cardinality of sets, which is the 
proper way to understand the intuitive notion of the “size” of sets, and we define 
finite sets, countable sets and uncountable sets. In Section 6.6 we discuss some im- 
portant properties of finite sets and countable sets. In Section 6.7 we apply the ideas 
of Sections 6.5 and 6.6 to study the cardinalities of the standard number systems. 
(Further topics pertaining to combinatorial questions about finite sets may be found 
in Sections 7.6 and 7.7.) 

As will be seen in Sections 6.5 and 6.6, the distinctions between finite sets, count- 
able sets and uncountable sets are very much tied to properties of the natural num- 
bers. We therefore start this chapter with a summary of some of the basic properties 


E.D. Bloch, Proofs and Fundamentals: A First Course in Abstract Mathematics, 195 
Undergraduate Texts in Mathematics, DOI 10.1007/978-1-4419-7127-2_6, 
© Springer Science+Business Media, LLC 2011 


196 6 Finite Sets and Infinite Sets 


of the natural numbers in Section 6.2. We will not prove these properties (doing so 
would take us too far afield), but these properties are very familiar, and the reader can 
either take them on faith, or look at the proofs found in the supplied references. One 
of the most important properties of the natural numbers, indeed one of the defining 
properties of that set of numbers, is the ability to do proof by induction, often referred 
to as the Principle of Mathematical Induction. We discuss proof by induction in detail 
in Section 6.3, both because it is a very useful technique for proofs of certain types of 
statements found in many parts of mathematics, and because in particular it is help- 
ful in proving some aspects of the natural numbers that are needed in later sections 
of this chapter. Another fundamental feature of the natural numbers is definition by 
recursion, which is related to, but not identical with, proof by induction. We discuss 
definition by recursion in Section 6.4, again both to gain a general understanding of 
that concept and because it will be useful later in the chapter. 


6.2 Properties of the Natural Numbers 


Many of the topics in the present chapter depend crucially upon the properties of the 
natural numbers. We will not prove these properties in this text, but will present a 
very brief summary of the minimum that is needed for our subsequent discussion. 
The curious reader can find proofs of all the relevant properties of the natural num- 
bers in [Blol1, Chapters | and 2]. 

Any rigorous treatment of the natural numbers must ultimately rely upon some 
axioms. There are two standard axiomatic approaches to developing the natural num- 
bers. One approach, involving the minimal axiomatic assumptions and the most ef- 
fort deducing facts from the axioms, is to assume the Peano Postulates for the natural 
numbers, stated below as Axiom 6.2.1. From these postulates, it is then possible to 
prove all the expected properties of the natural numbers, and, no less important, it 
is possible to construct first the integers, then the rational numbers and then the real 
numbers. This process is not brief, and some of the proofs are a bit tricky, though in 
principle all that is needed as background for such a construction is the material about 
sets, functions and relations that we have seen in previous chapters of this text. The 
details of this approach may be found in [Blo11, Chapter 1]. The other approach is to 
assume axioms for the real numbers, and then to locate the natural numbers inside the 
real numbers, and then prove all the usual properties of the natural numbers using 
the properties of the real numbers. This approach is shorter than the previous ap- 
proach, though it is ultimately less satisfying, because much more is being assumed 
axiomatically. The details of this approach may be found in [Blo1 1, Chapter 2]. 

The Peano Postulates for the natural numbers are based on the insight that the 
most fundamental property of the natural numbers is the ability to do proof by in- 
duction. Although it might seem that in order to do proof by induction it would also 
be necessary to be able to do other things with the natural numbers such as addition, 
it turns out that very little indeed is needed to do proof by induction. We need to 
have a set, denoted N; we need to have a distinguished element in the set, denoted 1, 
with which to start proof by induction; and we need to have a function from the set 
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to itself, denoted s: N — N, which intuitively takes each natural number to its suc- 
cessor. Intuitively, we think of the successor of a natural number as being the result 
of adding | to the number, though formally the notion of addition does not appear in 
the statement of the Peano Postulates. 

Of course, not every set with a distinguished element and a function from the set 
to itself behaves the way the natural numbers ought to behave, and hence the Peano 
Postulates require that three entities N, 1 and s satisfy a few simple properties. One 
of these properties, listed as Part (c) of Axiom 6.2.1 below, is just the formal state- 
ment that proof by induction works. This discussion might seem quite mysterious 
to the reader who has not previously encountered proof by induction, but we appeal 
to the patience of such reader, who will see a thorough discussion of this topic in 
Section 6.3. 


Axiom 6.2.1 (Peano Postulates). There exists a set N with an element 1 € N and a 
function s: N — N that satisfy the following three properties. 


a. There isnon €N such that s(n) = 1. 

b. The function s is injective. 

c. Let G CN be a set. Suppose that 1 € G, and that if g € G then s(g) € G. Then 
G=N. 


If we think intuitively of the function s in the Peano Postulates as taking each 
natural number to its successor, then Part (a) of the postulates says that | is the first 
number in N, because it is not the successor of anything. 

Although it does not say in the Peano Postulates (Axiom 6.2.1) that the set N is 
unique, in fact that turns out to be true. See [Blol1, Exercise 1.2.8] for a proof. We 
can therefore make the following definition. 


Definition 6.2.2. The set of natural numbers is the set N, the existence of which is 
given in the Peano Postulates. A 


How do we know that there is a set, and an element of the set, and a function 
of the set to itself, that satisfy the Peano Postulates? There are two approaches to 
resolving this matter. When we do mathematics, we have to take something as ax- 
iomatic, which we use as the basis upon which we prove all our other results. Hence, 
one approach to the Peano Postulates is to recognize their very reasonable and min- 
imal nature, and to be satisfied with taking them axiomatically. Alternatively, if one 
takes the Zermelo—Fraenkel Axioms as the basis for set theory, then it is not neces- 
sary to assume additionally that the Peano Postulates hold, because the existence of 
something satisfying the Peano Postulates can be derived from the Zermelo—Fraenkel 
Axioms. See [End77, Chapter 4] for details. 

The Peano Postulates are very minimal, and the reader might wonder how we 
can be sure that so minimal a hypothesis really characterizes the natural numbers 
as we intuitively know them. The answer is that we cannot resolve such a question 
rigorously, because we cannot prove things about our intuition. What does turn out 
to be true, as seen rigorously in [Blol1, Chapter 1], is that from the Peano Postulates 
we can define all the other familiar aspects of the natural numbers such as addition 
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and multiplication, and we can prove that these operations satisfy all the proper- 
ties we would intuitively expect. Could it happen that one day someone will deduce 
something from the Peano Postulates that we would not want to attribute to the nat- 
ural numbers as we intuitively conceive of them? In principle that could happen, 
but given that the Peano Postulates have been around for over a hundred years, and 
have been used extensively by many mathematicians, and no problems have yet been 
found, it seems quite unlikely that any secret problems are lurking around unseen. 

If one goes through the full development of the natural numbers starting from 
the Peano Postulates, the first major theorem one encounters is the one we now state. 
This theorem is used in particular in the definition of addition and multiplication 
of the natural numbers, and although we will not go over those definitions (see the 
reference given above), this theorem has many other applications in many parts of 
mathematics. This theorem, called Definition by Recursion, is in fact so valuable that 
it merits a section of its own, Section 6.4. 

Definition by Recursion allows us to define functions with domain N by defining 
a function at 1, and then defining it at n+ 1 in terms of the definition of the function 
at n. It is important to recognize that recursion, while intimately related to induction, 
is not the same as induction (though it is sometimes mistakenly thought to be); the 
essential difference is that induction is used to prove statements about things that are 
already defined, whereas recursion is used to define things. The proof of the follow- 
ing theorem, which relies upon nothing but the Peano Postulates (Axiom 6.2.1), is 
somewhat tricky; see [Blol1, Theorem 2.5.5] for details. 


Theorem 6.2.3 (Definition by Recursion). Let A be a set, letb€ A andletk: A—A 
be a function. Then there is a unique function f: N — A such that f(1) = b and 
fos=kof. 

The equation fos =kof in the statement of Definition by Recursion (Theo- 
rem 6.2.3) means that f(s(n)) = k(f(n)) for all n € N. If s(n) were to be inter- 
preted as n+ 1, as indeed it is once addition for N is rigorously defined (a defini- 
tion that requires Definition by Recursion), then f(s(n)) =k(f(n)) would mean that 
f(n+1)=k(f(n)), which looks more familiar intuitively. Additionally, the equation 
fos =kof can be represented by the following commutative diagram, which as 
always means that going either way around the square yields the same result. 


N —*—+> N 


Ma f 


H — {> 


Once Definition by Recursion has been established, it is possible to define addi- 
tion, multiplication and the relation less than for the natural numbers, and it is then 
possible to prove all the standard properties of these numbers; many of the proofs, 
not surprisingly, are by induction. The following theorem lists some of the most basic 
properties of addition, multiplication and less than for the natural numbers, though 
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of course not all such properties are listed. Again, all the details can be found in the 
reference cited above. 


Theorem 6.2.4. Leta,b,c,d EN. 


. Ifat+c=b+e, thena=b. 

. (at+b)+c=at+(b+c). 

. s(a) =atl. 

a+b=b-+a. 

a-l=a=l-a. 

(a+b)c =ac+be. 

ab = ba. 

c(a+b) =ca+cb. 

. (ab)c =a(be). 

10. If ac = be then a= b. 

Il. a>a,anda}a,anda+1>a. 

12. a> 1, and ifa# 1 thena> 1. 

13. Ifa<bandb <c, thena<c; ifa<bandb <c, thena<c; ifa<band 
b<c, thena<c; ifa<bandb<c, thena<c. 

14. a<bifand only ifa+c<b-+e. 

15. a<b ifand only if ac < be. 

16. Precisely one of the following holds:a<b, ora=b, ora>b_ (Trichotomy 
Law). 

I7, a<xborb<a. 

18. Ifa<bandb <a, thena=b. 

19. It cannot be thatb<a<b-+l. 

20. a<b ifand only ifa+1<b. 

21. Ifa <b, there is a unique p € N such that a+ p = b. 


SC eONAWAWNA 


Observe that Theorem 6.2.4 (3) states that the function s is just what we thought 
it would be. Most of the parts of Theorem 6.2.4 are very familiar to the reader, and 
most—though not all—also apply to all real numbers, not just the natural numbers. 
Parts (12) and (15) are specific to the natural numbers, because, intuitively, these 
numbers do not include zero, negative numbers and fractions. Parts (19) and (20) are 
both ways of saying that the natural numbers are “discrete,” a feature not shared by 
the rational numbers or the real numbers. 

The integers are also discrete in the sense of Theorem 6.2.4 (19) (20), so dis- 
creteness does not distinguish between the set of natural numbers and the set of inte- 
gers. There is, however, a very important difference between the natural numbers and 
the integers, which is that the integers intuitively “go to infinity” in two directions, 
whereas the natural numbers do so in only one direction. The following theorem in- 
tuitively combines the discreteness of the natural numbers together with this idea of 
“going to infinity” in only one direction. This theorem has many uses throughout 
mathematics; we will use it later in this chapter. See [Blol1, Theorem 2.4.6] for a 
proof. 
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Theorem 6.2.5 (Well-Ordering Principle). Let A C N be a set. If A is non-empty, 
then there is a unique m € A such that m <a for alla €A. 


The hard part of the proof of the Well-Ordering Principle (Theorem 6.2.5) is the 
existence of the number m given in the statement of the theorem; the uniqueness is 
very simple, following immediately from Theorem 6.2.4 (18). 

Finally, we note that in various places in this chapter, we will need to use subsets 
of the natural numbers of the form {a,...,b}. Because the concept of “...” is not in 
itself a rigorous one, we make this notation precise by using the following definition. 
There is nothing subtle in the following definition, but it is important to emphasize 
that writing “...” alone is not rigorous, except when we give it a rigorous meaning 
in specific cases, such as the following. 


Definition 6.2.6. Let a,b € N. The sets {a,...,b} and {a,...} are defined by 
{a,...,.b}={xENla<x<b} and {a,...}={reNl|a<x}. A 


Because 0 is not a natural number, then technically the set {1,...,0} is not de- 
fined. However, in order to avoid special cases in some proofs, we will allow our- 
selves to write the nonsensical expression “{1,...,0},” and it should be interpreted 
as the empty set. 

For the exercises in this section, the reader should use only the properties of the 
natural numbers stated in this section. Subsequently, the reader should feel free to 
use any standard properties of the natural numbers, as we have done until now. For 
the rest of this chapter, we will at times refer to some of the properties of the natural 
numbers stated in this section to emphasize their role in various proofs. 


Exercises 


Exercise 6.2.1. [Used in Theorem 6.3.11 and Exercise 6.3.16.] Let n © N. Prove that 
{1,...,n+1}-—{1,...,n} ={n4+ 1}. 


Exercise 6.2.2. [Used in Theorem 6.6.5.] Let a,b € N. Suppose that a < b. 


(1) Letk EN. Prove that there is a bijective function {a,...,b} > {a+k,...,b+k}. 
(2) Let p € N be the unique element such that a+ p = b, using Theorem 6.2.4 (21). 
Prove that there is a bijective function {a,...,b} — {1,...,p +1}. 


Exercise 6.2.3. [Used in Theorem 6.3.6.] Let b € N. Prove that {1,...,b}U{b+1,...}= 
Nand {1,...,b}A{b+1,...} =0 


Exercise 6.2.4. [Used in Theorem 6.4.5.] Let H be a non-empty set, let a,b € H and 
let p: H x H — H be a function. Prove that there is a unique function g: N — H 
such that g(1) =a, that g(s(1)) = b and that g(s(s(n))) = p((g(n), g(s(n)))) for all 
neEN. 

The main step of the proof is to apply Definition by Recursion (Theorem 6.2.3) 
to the set H x H, the element (a,b) and the function k: H x H — H x H defined by 
k((x,y)) = (y, p(x, y)) for all (x,y) € H x H. Use the result of that step to find the 
desired function g. 
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Exercise 6.2.5. [Used in Theorem 6.4.3.] Let H be a non-empty set, let e € H and let 
q: Hx N—H be a function. Prove that there is a unique function h: N — H such 
that (1) = e, and that h(s(n)) = q((A(n),n)) for alln EN. 

The main step of the proof is to apply Definition by Recursion (Theorem 6.2.3) 
to the set H x N, the element (e,1) and the function r: H x N — H x N defined by 
r((x,m)) = (q(x,m),s(m)) for all (x,m) € H x N. Use the result of that step to find 
the desired function h. 


6.3 Mathematical Induction 


Mathematical induction is a very useful method of proving certain types of state- 
ments that involve the natural numbers. It is quite distinct from the informal concept 
of “inductive reasoning,” which refers to the process of going from specific exam- 
ples to more general statements, and which is not restricted to mathematics. When 
we use the phrase “proof by induction” we will always refer to the mathematical sort 
of induction, not this other use of the term. 

More precisely, mathematical induction is a method that can be used to prove 
statements of the form (Vn € N)(P(7)), where P(n) is a statement with a free vari- 
able n that is a natural number. For example, we will shortly prove that the statement 
P(n) = “8" — 3” is divisible by 5” is true for all natural numbers n. How you origi- 
nally thought of trying to prove such a statement might have occurred in many ways, 
one of which is by playing around with various numerical examples, for example 
looking at 8! —3!. at 82 — 32, and at 8° — 33, and then using informal “inductive 
reasoning” to conjecture that 8” — 3” is divisible by 5 for all natural numbers n. 
Such reasoning by example does not, of course, constitute a proof that this conjec- 
ture is really true. For such a proof we will use induction. The formal statement 
of this method, usually referred to as the Principle of Mathematical Induction, ab- 
breviated PMI, is stated below. (For a more general look at proof by induction, see 
[End72, Section 1.2].) 

The intuitive notion of PMI is that to show that a statement about the natural 
numbers is true for all natural numbers, it is sufficient to show that the statement 
holds for n = 1, and that if it holds for n = 1 then it holds for n = 2, and that if it 
holds for n = 2 then it holds for n = 3, continuing ad infinitum. Of course, we cannot 
prove infinitely many such implications, but it is sufficient to prove that the statement 
is true for n = 1, and that for an arbitrary natural number n, if the statement holds for 
n then it holds for n+ 1. 

Our statement of PMI is given as Theorem 6.3.1 below, and it is stated with- 
out proof, because it is just a restatement of Part (c) of the Peano Postulates (Ax- 
iom 6.2.1). Formally, the statement of PMI gives criteria that guarantee that a subset 
of N subject to certain criteria is in fact all of N. We will see how to use these criteria 
in practice shortly. 


Theorem 6.3.1 (Principle of Mathematical Induction). Let G C N. Suppose that 
a. 1€G; 
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b. ifn eG, thenn+1€G. 
Then G=N. 


It is important to make use of Part (b) of PMI precisely as it is written. This part 
has the form P — Q. To show that Part (b) is true in some given situation, we do not 
show that P is true or that Q is true, but only that the conditional statement P — Q 
is true. In other words, to prove Part (b) of PMI, we do not show directly that n € G, 
nor that n+ | € G, but only that n € G implies n+ 1 € G. This fact is what makes 
PMI so convenient to use. 

We now have our first example of proof by induction. 


Proposition 6.3.2. [fn € N, then 8” — 3” is divisible by 5. 


Proof. Let 
G={n€N | 8" —3” is divisible by 5}. 


We will use PMI to show that G = N, and it will then follow that 8” — 3” is divisible 
by 5 for all n € N, which is what we need to prove. First, we observe that G C N by 
definition, and hence PMI is applicable. To use PMI, we need to show two things, 
which are that | € G, and that ifn € Gthenn-+ 1 € G. We start with the first of these. 
Observe that 8! —3! =5, and therefore 8! —3! is indeed divisible by 5. Hence 1 € G, 
which is Part (a) of the statement of PMI. 

To show Part (b) of the statement of PMI, let n € G. We then need to deduce 
that n+ 1 € G. Because n € G, we know that 8” — 3” is divisible by 5, which means 
that there is some k € Z such that 8” — 3” = 5k (recall the definition of divisibility in 
Section 2.2). To show that n+ 1 € G will require showing that 8”+! — 3”*! is divisible 
by 5; we can make use of our hypothesis that 8” — 3” is divisible by 5 in this proof. 
We compute 


girl —3ntl = 8-8" — 3.3” = (5-87-+ 3-8") —3.-37 
= 5-8"+3.(8" —3") =5-8"+43(5k) = 5(8"+ 3k). 
Because n and k are integers, then 8” + 3k is an integer, and hence 8”+! — 3”*! is 


divisible by 5. It follows that n+ 1 € G. We have therefore proved that Part (b) of the 
statement of PMI holds. PMI now implies that G = N, and the result is proved. 


The strategy used in the proof of Proposition 6.3.2 is quite typical. We first de- 
fined the set G; we then showed separately that Parts (a) and (b) of the statement 
of PMI each hold; and we then concluded that the desired result is true. It is often 
possible to make a proof by induction less cumbersome by avoiding mentioning the 
set G explicitly. Suppose that we are trying to show that the statement P(n) holds for 
all n € N. The formal way to proceed would be to define the set G to be those natu- 
ral numbers for which P(n) is satisfied, and then verify that G = N by showing that 
1 €G, and that n € Gimplies n+ 1 € G, for all n € N. The less cumbersome, but just 
as valid, way of proceeding is to state that we are trying to prove that the statement 
P(n) holds for all n € N by induction. We then show that P(1) holds, and that if P(7) 
holds so does P(n+ 1) for all n € N. The latter of these two parts is often referred 
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to as the “inductive step,” and the assumption that P(7) holds in the inductive step is 
often referred to as the “inductive hypothesis.” It is often convenient to use clearly 
equivalent variants of the inductive step, for example showing that if P(n — 1) holds 
then so does P(7) for all n € N such that n > 2. We will see more significant variants 
of the inductive step shortly. 

The following example of proof by induction, which we write in the less cum- 
bersome style mentioned above, is quite standard. We note, as always, that “---,” as 
in the following proposition, is not completely rigorous, unless a valid definition of 
“...”? is provided for the particular case under consideration. Such a definition for the 
type of formula in the following proposition is found in Example 6.4.4 (2); we will 
not discuss this use of “---” more extensively at present, to avoid a detour from our 
task at hand, which is proof by induction. 


Proposition 6.3.3. [fn € N, then 


n(n+1) 

—— 
Proof. We prove the result by induction on n. First, suppose that nm = 1. Then 1 + 
2+---+n=1, and “SS = _ = 1. Therefore Equation 6.3.1 holds for the case 
n= 1. Now let n € N. Suppose that Equation 6.3.1 holds for this n. It then follows 
from that equation that 


142+---+n= (6.3.1) 


14+2+--+(nt1)={142+---+n}4+(n+1) 


es ey 


(n+1)[(n+1)+1] 
5 . 
This last expression is precisely the right-hand side of Equation 6.3.1 with n+ 1 


replacing n. Hence we have proved the inductive step. Therefore Equation 6.3.1 holds 
for alln € N. 


It is important to observe that proof by induction shows only that a statement 
of the form P(n) is true for each n € N. We cannot prove that P() is true for n = 
co, whatever this might mean. A proof by induction does show that P(n) holds for 
infinitely many numbers n, but each such number is a finite number. We do not 
consider co to be a natural number (or any other type of real number), and so PMI 
does not apply to it. 

Proof by induction is not always as straightforward as it appears. The following 
example is a well-known alleged “proof” by induction, which clearly cannot be valid. 


Example 6.3.4. We will prove that all horses have the same color. More precisely, 
we will show that the statement “for any set of 7 horses, all the horses in the set have 
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the same color,” is true for all n € N. Because there are only finitely many horses 
in the world, it will then follow that all existing horses have the same color. First, 
suppose that n = |. It is certainly true that for any set of one horse, all the horses 
in the set have the same color. Next, suppose that the result is true for n, so that for 
any set of n horses, all the horses in the set have the same color. We need to show 
that the result is true for n + 1. Let {H),...,Hn+1} be a set of n+1 horses. The set 
{H,...,Hn} has n horses, so by the inductive hypothesis all the horses in this set 
have the same color. On the other hand, the set {Ho,...,H,+1} also has n horses, so 
all horses in this set have the same color. In particular, it then follows that H,, and 
H,+41 have the same color. Combining this fact with the previous observation that 
horses H),...,H, all have the same color, it follows that H),...,H,+1 all have the 
same color. We have therefore proved the inductive step. Hence all horses have the 
same color. 

The reader is asked in Exercise 6.3.5 to find the flaw in the above argument. 4 


The following example gives an application of induction to switching circuits, 
and hence to computers, which are built out of such circuits. 


Example 6.3.5. Digital computers are based on circuits in which each input and 
each output is either on or off (as the result of having, or not having, electric cur- 
rent). These two states are often represented as | or 0, respectively. At its simplest, 
a switching circuit is a device with some number of inputs, say x1,...,X,, and one 
output, say y; each input and the output can have values | or 0 only. The switch- 
ing circuit takes each collection of values of the inputs, and produces a correspond- 
ing value for the output. A switching circuit can therefore be viewed as a function 
f: {0,1}" — {0,1}, and it can also be represented schematically by the type of dia- 
gram seen in Figure 6.3.1. 


x1 

x2 
switching y 
circuit 

Xn 


Fig. 6.3.1. 


Different types of calculations require different switching circuits. For each n € 
N, there are 27" possible switching circuits with n inputs; for the sake of keeping to 
the topic at hand, we will omit the proof of this fact, other than to note that it is an 
application of Theorem 4.5.4, combined with basic facts about the sizes of products 
of finite sets and the sizes of power sets of finite sets, which are proved in Sections 7.6 
and 7.7, together with proof by induction. The important point to keep in mind for 
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the present example is that even for fairly small values of n, the number of possible 
switching circuits with n inputs is quite large; for example, when n = 5 there are over 
4 billion possible switching circuits. From a manufacturing point of view, it would 
therefore be very unfortunate if each possible switching circuit would have to be built 
by an independent process. Fortunately, as we will now show, all switching circuits 
can be built up out a small number of simple (and familiar) components. 

In Exercise 1.3.13 we discussed the notion of binary and unary logical operations, 
of which A, V and — are examples; we also defined a new binary logical operation, 
denoted A. If we replace the values T and F that we used in our discussion of logic 
with the values 1 and 0, respectively, then we see that a unary logical operation 
is nothing but a switching circuit with one input, and a binary logical operation is 
a switching circuit with two inputs. It is common to denote =, A, V and A with 
schematic symbols, such as those shown in Figure 6.3.2. 


a Ss 


nand 


Fig. 6.3.2. 


We now prove by induction that every switching circuit can be built up out of 
A, V and — circuits. The induction is on n, the number of inputs in our switching 
circuits. That the result is true for n = 1 and for n = 2 follows immediately from 
Exercise 1.3.13 (2). Now suppose that the result is true for all switching circuits 
with n inputs. Let C be a switching circuit with n+ 1 inputs, labeled x1,...,%)+1. 
We define two new switching circuits Co and C; as follows. Let Co be the switching 
circuit with inputs x1,...,x,, such that the output of Co for each collection of values 
of x1,...,%, equals the output of C for the same values of x1,...,x, and the value 
Xn+1 = 0. Define C; similarly, using x,.; = 1. The reader can then verify that the 
circuit shown in Figure 6.3.3 has the same output as C for each collection of values 
of x1,.-.,Xp+1. Because Cp and C; both have n inputs, it follows from the inductive 
hypothesis that each can be constructed out of A, V and — circuits. Hence C can be 
constructed out of A, V and — circuits. By induction, it follows that every switching 
circuit can be made out of our three building blocks. Even better, Exercise 1.3.13 (3) 
shows that every switching circuit can be built out of only A circuits. 

See [LP98, Sections 2.7 and 2.8] or [Fab92] for more about switching circuits. 


> 


There are various alternative versions of PMI, each of which is useful in cer- 
tain situations where PMI might not be directly applicable. There do not seem to be 
standard names for these variants. Different texts use terms such as “Extended Princi- 
ple of Mathematical Induction,” “Second Principle of Mathematical Induction,” and 
the like. We will simply call them Principle of Mathematical Induction—Variant 1, 


206 6 Finite Sets and Infinite Sets 


Variant 2 and Variant 3, respectively, using the abbreviations PMI-V1, PMI-V2 and 
PMI-V3. All three variants work similarly to PMI, in that they all have two parts, the 
second of which is the inductive step. 


x1 
x2 


Xn 


Xnt+1 


Fig. 6.3.3. 


The first of the variants on PMI is useful when we wish to prove that a statement 
P(n) is true for all natural numbers n such that n > ko, for some given natural number 
ko. 


Theorem 6.3.6 (Principle of Mathematical Induction—Variant 1). Let G CN, 
and let ky € N. Suppose that 


a. kj EG; 
b. ifn € {ko,...} andn € G, thenn+1€G. 


Then {ko,...} CG. 


Proof. First, suppose that ko = 1. It then follows from Theorem 6.2.4 (12) that the 
condition “n > ko” is true for all n € N. In particular, we see that {ko,...} = N. 
Because G CN, the statement “{ko,...} C G” is then equivalent to “G = N.’ It 
follows that when ko = 1, the statement of PMI-V1 is equivalent to the statement 
of PMI (Theorem 6.3.1), and so there is nothing to prove in this case. From now 
on assume that kg 4 1. By Theorem 6.2.4 (12) (21) there is some b € N such that 
b+1=ko. 

Let G' = {1,...,b} UG. We will show that G’ = N. It will then follow that 
{1,...,b} UG =N, and hence that {ko,... } C G by using Exercise 6.2.3 and Ex- 
ercise 3.3.10. 
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We now use PMI to show that G’ = N. By definition we know that 1 € G’. Sup- 
pose that g € G’. We will show that g+1 € G’, and the proof will be complete. By 
Theorem 6.2.4 (16) we know that precisely one of the following holds: either g < b, 
or g = 5b, or g > b. We treat each case separately. 

Case 1: Suppose that g < b. Then g+ 1 < b by Theorem 6.2.4 (20). By Part (12) of 
the same theorem, we know that 1 < g+1, and hence g+1 € {1,...,b} CG’. 

Case 2: Suppose that g = b. Then g+1=b+1=ko. Hence g+ 1E GCG’. 

Case 3: Suppose that g > b. Then g < b by Theorem 6.2.4 (16), and hence g ¢ 
{1,...,b}. Because g € G’ = {1,...,b} UG, it follows that g € G. Moreover, because 
g > b, we know by Theorem 6.2.4 (20) that g > b+ 1 = ko, and hence g € {ko,...}. 
We now use the hypothesis on G to see that g + 1EGCG’. 


Observe that in PMI-V1 we do not deduce that G = N, only that {ko,...} CG. It 
might be the case that the set G contains numbers less than ko, but we cannot deduce 
that from the statement of PMI-V1. The following proof is an example of the use 
of PMI-V1. As always, note the difference between the scratch work and the actual 
proof. 


Proposition 6.3.7. Ifn ¢ N andn > 5, then 4" > nt. 


Scratch Work. For the case n = 5, it is easy to verify that 4° > 5+. Now suppose that 
we know the result for some n, so that 4” > n*. We want to deduce that 4”*! > (n+ 
1)*. By brute force multiplication, or using the binomial formula (Theorem 7.7.14), 
we see that (n+ 1)* =n*+4n? + 6n? +4n+ 1. Because this expression has a number 
of pieces, it might be helpful to write 4”7+! = 4.4” = 4" + 4" 4 4" + 4", Because 
we know that 4” > n+, it would suffice to show the three inequalities 4” > 4n> and 
4” > 6n? and 4" > 4n+ 1 hold. To show these inequalities, we can make use of the 
fact that n > 5, as well as the fact that 4” > n*. First, we observe that 4n? < 5n3 < 
n-n? =n* <4". Next, we observe that 6n2 < 52n? < n2n? =n'* < 4". Finally, we 
have 4n +1 <4n+n=5n <n-n <n‘ <4". Putting all these observations together 
will do the trick. ieee 


Proof. We prove the result by induction on n, making use of PMI-V1 with kp =S. 
First, suppose that n = 5. Then 4° = 1024 > 625 = 5+. Hence the desired result holds 
when n = 5. Now suppose that the result holds for some n € N such that n > 5, which 
means that 4” > n* for this n. We start with three preliminary observations, which 
are 

4" >nt >n? >5n=4n+n>4n+1, 


and 
4" > n* > 53*n? > 6n’, 
and 
4" >n* > 4n’. 


Combining the three inequalities with the inductive hypothesis we obtain 


antl 4.4" = 4" 4.4" 44" 4.4" >? 4 An) + 6n? + (4n+1) =(n+1)’. 
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Therefore the desired result holds for 1+ 1. The proof is then complete by PMI- 
Vi. 


The second variant on PMI again reverts to starting at n = 1, and to deducing that 
the set G equals all of N, but it has a slightly different type of inductive step than 
either PMI or PMI-V1. 


Theorem 6.3.8 (Principle of Mathematical Induction—Variant 2). Let G CN. 
Suppose that 


a. 1€G; 
b. ifn € Nand {1,...,n} CG thenn+1€G. 


Then G=N. 


Proof. Suppose that G 4 N; we will derive a contradiction. Let H = N —G. Because 
H CN and H #9, the Well-Ordering Principle (Theorem 6.2.5) implies that there is 
some m € H such that m < h for all h € H. Because 1 € G we know that | ¢ H, and 
therefore m 4 1. By Theorem 6.2.4 (12) (21) there is some b € N such that b+ 1 =m. 

Let p € {1,...,b}. It follows that p < b <b+1 =m by Theorem 6.2.4 (11). 
Part (16) of the same theorem implies that p 7 m. Therefore p ¢ H, and so p € G. 
We have therefore shown that {1,...,b} C G. Part (b) of the hypothesis on G then 
says that b+ 1 € G, which means that m € G. This last statement is a contradiction 
to the fact that m € H. We conclude that G=N. 


When using PMI-V2, the inductive step involves showing that if the desired state- 
ment is assumed to hold for all values in {1,...,2}, then it holds for n+ 1. This 
method contrasts with PMI and PMI-V1, where we showed that if the statement is 
assumed to hold only for n, then it holds for n+ 1. It might appear as if we are unfairly 
making life easier for ourselves when we use PMI-V2, by allowing a larger hypoth- 
esis in order to derive the same conclusion, but PMI-V2 has been derived rigorously 
from PMI, and so we are free to use it whenever we need to. (The proof of PMI-V2 
does not appear to make use of PMI, but the latter is nonetheless used implicitly, 
because it is needed for the proof of the Well-Ordering Principle (Theorem 6.2.5); 
see [Blol1, Theorem 2.4.6] for details.) 

Our third variant on PMI combines the first two variants. 


Theorem 6.3.9 (Principle of Mathematical Induction—Variant 3). Let G C N, 
and let ky € N. Suppose that 


a. kj €G; 
b. ifn © {ko,...} and {ko,...,n} CG, thenn+1€G. 


Then {ko,...} CG. 


Proof. Left to the reader in Exercise 6.3.13. 
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An example of using PMI-V3 is the proof of the following theorem, which is 
a basic tool in number theory. An examination of the proof reveals why PMI-V3 
is used in this case rather than PMI-V1. Recall the definition of prime numbers in 
Definition 2.3.6. 


Theorem 6.3.10. Letn © N. Suppose that n > 2. Then n is either a prime number or 
a product of finitely many prime numbers. 


Proof. We will use PMI-V3 with kp = 2. First, suppose that n = 2. Because 2 is 
a prime number, the desired result is true for n = 2. Now let n € N. Suppose that 
n > 2, and that the desired result holds for all natural numbers in the set {2,...,n}; 
that is, we assume that each of the numbers in {2,...,n} is either a prime number 
or a product of finitely many prime numbers. We need to show that n+ | is either 
a prime number or a product of finitely many prime numbers. There are two cases, 
depending upon whether or not n+ 1 is a prime number. If n+ 1 is a prime number, 
then there is nothing to prove. Now assume that n+ | is not a prime number. Then 
there are natural numbers a and b such that n+ 1 = ab, and that 1 <<a<n+1 and 
1<b<n+1. Therefore a,b € {2,...,n}. By the inductive hypothesis we know that 
each of a and b is either a prime number or a product of finitely many prime numbers. 
It now follows that n + 1 = ab is the product of finitely many prime numbers. 


The above result can be proved for all integers (and not just natural numbers), 
and it can also be proved that the decomposition into prime numbers is unique. The 
version of the theorem for integers that includes both existence and uniqueness is 
known as the Fundamental Theorem of Arithmetic. See [RosO5, Section 3.5] for 
details. 

We conclude this section with the following somewhat technical theorem about 
functions between sets of the form {1,...,7}; we will need this theorem when we 
discuss properties of finite sets in Section 6.6. 


Theorem 6.3.11. Letn,k EN. 


I. Let f: {1,...,n} — N be a function. Then there is some q € {1,...,n} such 
that f(q) > f(i) for alli € {1,...,n}. 

2. Let S C {1,...,n} be a non-empty subset. Then there is a bijective function 
g: {1,...,n} > {1,...,n} such that g(S) = {1,...,r} for some r © N such 
that r <n. If Sis a proper subset of {1,...,n}, thenr <n. 

3. Let f: {1,...,n}— {1,...,k} be a function. If f is bijective, then n =k. If f 
is injective but not surjective, thenn < k. 


Proof. We will prove Part (3), leaving the rest to the reader in Exercise 6.3.16. 


(3). First, suppose that f is bijective. We prove the result by induction on k, 
where for each k we will assume that n is arbitrary. Suppose that k = 1. Then 
{1,...,4} ={1}. Ifp: {1,...,2} — {1,...,k} is a bijective function, then {1,...,} 
must also have one element, which implies that n = 1. Hencen =k. 

Now suppose that the result is true for some k EN. Leth: {1,...,n} > {1,...,k+ 1} 
be a function. Suppose that h is bijective. We know that k+ 1 >k > 1. It follows that 
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{1,...,k+1} has more than one element, and hence it must be the case that n > 1 
in order for h to be bijective. By Theorem 6.2.4 (21) we see that there is some g € N 
such that g+ 1 =n. There are now two cases. 

Suppose first that h(n) =k +1. Let h: {1,...,q} > {1,...,k} be defined by 
h(a) = h(a) for all a € {1,...,q}, which makes sense because of Exercise 6.2.1 and 
the fact that h is injective. Because h is bijective, and because h(n) = k +1, it follows 
that h is bijective. The inductive hypothesis applied to h implies that g = k. It follows 
that n = q+1=k-+1. Hence the result holds for k + 1. 

Suppose second that h(n) #k+ 1. Then, using Exercise 6.2.1 again, together with 
the fact that / is bijective, we deduce that k+ 1 = A(s) for a unique s € {1,...,q}. 
Let h: {1,...,n} > {1,...,k +1} be defined by 


h(n), ifa=s 
h(a)=<k+1, ifa=n 
h(a), otherwise. 


Then h is bijective, and h(n) = k + 1. By applying the previous case to h, we deduce 
that n = k+ 1. The proof in the case that f is bijective is now complete. 

Next, suppose that f is injective but not surjective. Then f({1,...,7}) Gl seghh 
Let f: {1,...,n} — f({1,...,}) be defined by f(a) = f(a) for alla € {1,...,n}. 
Then f({1,...,n}) = f({1,...,n}), and f is bijective. By Part (2) of this theorem 
there is a bijective function g: {1,...,k} — {1,...,k}, such that g(f({1,...,2})) = 
{1,...,r} for some r € N such that r< k. Let g: f({1,...,n}) — {1,...,r} be 
defined by g(a) = g(a) for all a € f({1,...,n}). Then ¢ is bijective. It follows 
from Exercise 4.3.5 that (go f)({1,...,n}) = a(f({1,...,2})) =8(f({1,...,n})) = 
{1,...,r}. Because f and @ are both bijective, then go f: {1,...,n} — {1,...,r} is 
bijective by Lemma 4.4.4 (3). It now follows from what we proved about bijective 
functions that n = r. We deduce that n < k. 


Exercises 


Exercise 6.3.1. Prove that each of the following formulas holds for all n € N. 


(1) 14+34+5+---+(2n-1) =n’. 


— n(n+1)(2n+1) 
(2) Pe ag Z . 
3 3 3 __ n?(n+1)? 
@) PtP 4-.pr = 
(4) 13+33+---+(2n—-1)3 =n?(2n? - 1). 
(5) 1-24+2-34----+n(n+4 1) = eH) 
1 1 1 n 
(6) 120 oat TG ee 


Exercise 6.3.2. Prove that | + 2n < 3” for alln € N. 


Exercise 6.3.3. Let a,b € N. Prove that a” — b” is divisible by a —b for alln EN. 
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Exercise 6.3.4. [Used in Theorem 6.6.7.] Let f: N — N be a function. Suppose that 
f(n) < f(n+1) for all n € N. Prove that f() > 7 for all n € N. Be explicit about 
which properties of N, as stated in Section 6.2, you are using. 


Exercise 6.3.5. [Used in Example 6.3.4.] Find the flaw in Example 6.3.4. 


Exercise 6.3.6. For which values of n € N does the inequality n* —9n +19 > 0 hold? 
Prove your answer by induction. 


Exercise 6.3.7. Prove that (1 + 1)" <n for all n € N such that n > 3. 
Exercise 6.3.8. Prove that 7n < 2” for all n € N such that n > 6. 
Exercise 6.3.9. Prove 3” > n° for all n € N such that n > 4. 


Exercise 6.3.10. Prove that 


forallne N. 


Exercise 6.3.11. Prove that 


for all n € N such that n > 2. (The symbol J] denotes the product of all the terms.) 


Exercise 6.3.12. Prove that 


for all n € N such that n > 2. 
Exercise 6.3.13. [Used in Theorem 6.3.9.] Prove Theorem 6.3.9. 


Exercise 6.3.14. [Used in Theorem 6.6.9 and Exercise 6.6.12.] Let f: N— NU {0} be 
a function. Suppose that f(1) = 0, and that if n < m then f(n) < f(m), for all n,m € 
N. Prove that for each x € N, there are unique n, p € N such that f(n) <x < f(n+1) 
and x = f(n) + p. (If, for example, we let b € N, and we use the function f defined 
by f(n) = (n— 1)b for all n € N, then we obtain a variant of the Division Algorithm 
(Theorem A.5).) 


Exercise 6.3.15. [Used in Theorem 6.4.8.] Let p € N, and let G C N. Suppose that 


a. LEG; 
b. ifn € {1,...,p—1} and {1,...,n} CG, thenn+1€G. 


Prove that {1,...,p} CG. 


Exercise 6.3.16. [Used in Theorem 6.3.11.] Prove Theorem 6.3.11 (1) (2). 
[Use Exercise 6.2.1.] 
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Exercise 6.3.17. Let k,m € N, and let f: {1,...,m}— {1,...,k} be a function. 
Prove that if m > k, then f is not injective. A combinatorial interpretation of this 
fact is known as the Pigeonhole Principle, which says that if m objects are placed 
in k boxes, where m > k, then there will be a box with more than one object in it. 
Though this principle may seem innocuous, it is very important in combinatorics. 
See [Rob84, Section 8.1] for further discussion and applications. 


6.4 Recursion 


Consider the familiar sequence 1,2,4,8,16,.... If we let a, denote the n'” term of the 
sequence, then a, = 2"-! for all n € N. Such a formula describes each term of the 
sequence explicitly in terms of n, and is a very convenient way of describing the se- 
quence. There is, however, another useful way of describing this sequence, which is 
by stating that a; = 1, and that a,4, = 2a, for alln € N. Such a description is called 
a recursive description of the sequence. Recursion, of which we will see some inter- 
esting examples shortly, is important not only in mathematics, but also in logic, and 
in the application of logic to computer science; see [Rob86] or [DSW94, Chapter 3] 
for details. See [End72, Section 1.2] for a more general look at the mathematical 
approach to recursion, and see [Rob84, Section 5.1] for various applied uses of re- 
cursion. 

Given a sequence for which we already have an explicit formula for each a, 
in terms of n, it can be useful to find a recursive formula, but there is no question 
that the sequence exists. What about a sequence for which we have only a recur- 
sive description, but no explicit formula? For example, suppose that we have the 
recursive description cy = 4, and cy+; = 3+ 2c, for all n € N. Is there a sequence 
C1,C€2,C€3,-.. Satisfying such a description? That is, does this description actually de- 
fine a sequence? It does appear intuitively as if there is such a sequence, because we 
can proceed “inductively,” producing one element at a time. We know that c; = 4. 
We then compute cz = 3+ 2c; =3+2-4= 11, andc3 =3+2c2 =3+2-11 =25, 
and so on. We could continue indefintely in this way, and it would seem that the se- 
quence c1,C2,C3,... is defined for all n € N. Our intuition will turn out to be correct, 
and the sequence is indeed defined, and moreover uniquely defined, for all n € N. In 
fact, we will give an explicit formula for this sequence in Example 6.4.2. 

However, although the method of definition by recursion for defining sequences 
can be made completely rigorous, it is not as simple as we made it appear in the 
previous paragraph. Just saying “proceed inductively” is not satisfactory. Proof by 
induction, as discussed in Section 6.3, works for something that is already defined; 
here, by contrast, we are defining something, so proof by induction is not applicable. 
Of course, once something is defined by recursion, it is very common to prove things 
about it using induction. 

There are a number of variations of the process of definition by recursion, the 
most basic of which is as follows. Suppose that we are given a number b € R, anda 
function h: R — R. We then want to define a sequence aj,a2,... such that a; = b and 
that ay+1 = h(a,) for all n € N. To be more precise, recall from Example 4.5.2 (4) 
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that the formal definition of a sequence of real numbers is simply a function f: N — 
IR, which can be converted to the more standard notation for sequences by letting 
an = f(n) for alln € N. Although the sequences discussed in Example 4.5.2 (4) were 
in R, the same approach applies to sequences in any set, so that a sequence in the set 
A is simply a function f: N— A. 

We can now state the theorem that guarantees the validity of definition by re- 
cursion. We have in fact already seen this theorem in Section 6.2, stated as Theo- 
rem 6.2.3, and we are simply restating it here in a form that is more familiar and easy 
to use. 


Theorem 6.4.1 (Definition by Recursion). Let A be a set, letb <A andletk: A—A 
be a function. Then there is a unique function f : N — A such that f(1) = b, and that 
f(nt+1)=k(f(n)) for alin EN. 


Stated more informally, Definition by Recursion (Theorem 6.4.1) says that if A 
is a set, if b € A and if k: A —A is a function, then there is a unique sequence 
a1,42,a3,... in A such that a; = b, and that aj4, = k(a,) for alln € N. 


Example 6.4.2. 


(1) We previously asked whether there is a sequence that satisfies the conditions 
cy = 4, and cy41 = 3+2c, for alln € N. We can now treat this example rigorously. 
Let b = 4, and let h: R — R be defined by h(x) = 3+ 2x for all x € R. Then Definition 
by Recursion (Theorem 6.4.1) tells us that there is a unique function f: N — R such 
that f(1) = 4, and that f(n+1) = 3+2f(n) for all n € N. If we let cy = f(n) 
for all n € N, then the sequence c,c2,c3,... satisfies the conditions c) = 4, and 
Cnat41 = 34+2cy for alln EN. 

Definition by Recursion tells us only that the sequence c,c2,c3,... with the de- 
sired properties exists; it does not give us an explicit formula for this sequence. It 
is not always possible to find an explicit formula for every sequence defined by re- 
cursion, although in the present case such a formula can be found. By calculating 
the first few terms of the sequence, and a bit of trial and error, it is possible to guess 
the formula c, = 7-2"—! —3 for alln € N. To prove that this formula holds, we use 
PMI. First, we show that the formula holds for n = 1, which is seen by computing 
7-2!-! _3 = 4, and observing that cy = 4. Next, suppose that the result holds for 
some n € N, which means that c, = 7-2”~! —3 for this n. We then show that the 
result holds for n+ 1, which we accomplish by computing 


Cn+1 = 342c, =3+2{7-2"-1 =3). = 7.9013. 


It then follows from PMI that the formula holds for all n € N. 
(2) Let A be a non-empty set, and let f: A — A be a function. For any n € N, we 
would like to define a function, denoted f”, by the formula 


fla for-of. 
eH 
n times 
However, anything involving “---” is not rigorous, unless the “:--” is an abbreviation 
for something that has been rigorously defined, which we can do in the present case 
by using Definition by Recursion. 
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Recall the notation ¥(A,A) defined in Section 4.5. Let k: ¥(A,A) > F(A,A) 
be defined by k(g) = fog for all g € ¥(A,A). We can then apply Definition by 
Recursion (Theorem 6.4.1) to the set #(A,A), the element f € ¥(A,A) and the 
function k: ¥(A,A) — #(A,A), and we deduce that there is a unique function 
@: N— £(A,A) such that @(1) = f and that @(n+ 1) =k(@(n)) = (fo@)(n) for all 
n € N. We now simply let the notation “f”” be defined to mean @(n), for all n € N. 
Then f! = f, and f"*! = fo f” for all n €N, just as expected. We refer to f” as the 
n-fold iteration of f. This topic was discussed briefly in Exercise 4.4.20 and Exer- 
cise 4.4.21, where we assumed that f” was defined intuitively, because we did not 
yet have Definition by Recursion at our disposal. Iterations of functions are widely 
used in mathematics, and in particular are central to the study of dynamical systems 
and chaos; see [ASY97]. © 


In the formulation of Definition by Recursion we have used so far, we defined 
a sequence d1,d2,da3,... ina set A by specifying that a, = b, and that ay+1 = k(ay) 
for all n € N, where b and k are the appropriate objects. In particular, each ay+1 is a 
function of a, alone. In some situations, however, we might need a more complicated 
formula for a,,4,. For example, suppose that we want to define a sequence by speci- 
fying that aj = 1, and a,,,; =n+q, for all n € N. Such a definition of a sequence is 
not covered by Definition by Recursion (Theorem 6.4.1), though it does turn out to 
produce a well-defined sequence, which starts 1,2,4,7,11,.... The following result, 
a variant of Definition by Recursion, shows that everything works out as expected. 


Theorem 6.4.3. Let A be a set, let b € A and let t: AX N —A be a function. 
Then there is a unique function g: N — A such that g(1) = b, and that g(n+ 1) = 
t((g(n),n)) foralln EN. 


Proof. This theorem is just a restatement of Exercise 6.2.5. 


Example 6.4.4. 


(1) We want to define a sequence by specifying that a, = 1, and that a,+) = (n+ 
1)a, for alln € N. Using Theorem 6.4.3 with b = 1, and with tr: R x N > R defined 
by ¢(x,m) = (m+ 1)x for all (x,m) € Rx N, we see that there is a unique sequence 
satisfying these conditions. This sequence starts 1,2,6,24,120,..., and consists of 
the familiar factorial numbers. We use the symbol n! to denote a,, for all n € N. 
The reader might wonder whether we could have dispensed with the Definition by 
Recursion entirely, and have simply defined a, to be n! for all n € N, but that would 
be doing things backwards. The notation n! is informally defined by writing n! = 
n(n—1)(n—2)---2-1, but this is not a rigorous definition, because of the appearance 
of “---.2’ The formal way to define n! is to say that it is the value of a, for the sequence 
we have defined by recursion; doing so then gives a rigorous meaning to the --- 
appearing in the expression n(n — 1)(n — 2)---2-1. From Definition by Recursion, 
we deduce immediately that (7+ 1)! = (n+ 1)n! for all n € N, because that is the 
result of substituting n! for a, in the condition 4,41 = (n+ L)ap. 

(2) In Proposition 6.3.3 we wrote the expression “1 +2-+---+ 7,” and in Ex- 
ercise 6.3.1 we had similar expressions, such as “1? + 27+ --- +77.” We now 
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” 


use Theorem 6.4.3 to give this use of “---” a rigorous definition. In general, let 
f: N—R be a function. We want to give a rigorous meaning to the expression 
“FD + f(2) +0 +f (0)? 

Let g: Rx N —R be defined by q((x,n)) =x+ f(n+1) for all (x,n) ERX N. 
We then apply Theorem 6.4.3 to the set R, the element f(1) € R and the function q, 
and we deduce that there is a unique function h: N — R such that h(1) = f(1), and 
that h(n +1) = q((h(n),n)) = h(n) + f (n+ 1) for all n € N. We now let the notation 
“f(1) + f(2) +---+ f(n)” be defined to mean h(n), for alln € N. .) 


Our next version of Definition by Recursion is used for a particularly interesting 
sequence, namely, the well-known Fibonacci sequence, which starts 


1,1,2,3,5,8, 13,21,34,55,89, 144... 


The numbers in this sequence are referred to as Fibonacci numbers, named after the 
medieval mathematician Fibonacci (also known as Leonardo of Pisa), who discov- 
ered these numbers when investigating a mathematical problem concerning rabbits. 
See [Hun70, Chapter 12] for details. 

The Fibonacci numbers arise in a variety of unexpected places, such as in phyl- 
lotaxis, which is the study of certain numbers that arise in plants, for example, the 
numbers of petals in flowers, the numbers of spirals in pine cones, and others. See 
Figure 6.4.1 for some of the spirals formed by the seeds of a sunflower; it often hap- 
pens that the number of spirals in each direction is a Fibonacci number (we note 
that the number of spirals in each of the two directions are not necessarily equal). 
See [Cox61, Chapter 11] and [Rob84, Section 5.1.2] for further discussion and ref- 
erences to the use of Fibonacci numbers in phyllotaxis and other areas. Why the 
Fibonacci numbers show up in the study of plants appears not to be known, as stated 
in [Rob84, pp. 202-203]. On the other hand, in [Tho59, Chapter XIV], an earlier 
study of growth, form and shape in biological phenomena, it is claimed that there 
are mathematical reasons for the Fibonacci numbers appearing in pine cones and the 
like; the reader should decide for herself what to make of that author’s arguments. 
Even he says, however, “We come then without much ado to the conclusion that 
while the Fibonacci series stares us in the face in the fir-cone, it does so for mathe- 
matical reasons; and its supposed usefulness, and the hypothesis of its introduction 
into plant structure through natural selection, are matters which deserve no place in 
the plain study of botanical phenomena. As Sachs shrewdly recognized years ago, 
all such speculations as these hark to a school of mystical idealism.” 

What concerns us here is not biology but the mathematical properties of the Fi- 
bonacci numbers. Some mathematically serious treatments of the Fibonacci numbers 
are found in [Knu73, Section 1.2.8], [GKP94, Section 6.6] and [HHP97, Chapter 3]. 
See [Gar87] or [Hun70] for slightly more offbeat discussions of the Fibonacci num- 
bers. 

Let the elements of the Fibonacci sequence be denoted F), F2,.... An examination 
of the sequence reveals its basic pattern, which is F,42 = Fy41 + F, for all n € N. 
Formally, the Fibonacci sequence is the unique sequence specified by F, = 1, and 
Fy = 1, and Fy42 = Fy+1 + Fy for all n € N. This type of definition of a sequence is 
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not covered by either Theorem 6.4.1 or Theorem 6.4.3, but the following variant of 
these theorems suffices. 


Theorem 6.4.5. Let A be a set, let a,b € A and let p: Ax A —A be a function. 
Then there is a unique function f : N — A such that f (1) =a, that f(2) = b and that 


f(n+2) = p((f(n), f(n+1))) for allneN. 


Proof. This theorem is just a restatement of Exercise 6.2.4. 


The Fibonacci sequence is defined using Theorem 6.4.5 with a = 1, with b = 1, 
and with p: R x R > R defined by p((x,y)) =x+y for all (x,y) € Rx R. The fol- 
lowing proposition gives a few examples of formulas involving the sums and prod- 
ucts of Fibonacci numbers. For more such formulas (of which there are remarkably 
many), see [Knu73, Section 1.2.8 and exercises] and [GKP94, Section 6.6], as well 
as the exercise at the end of this section. 


Proposition 6.4.6. Letn € N. 


1 Peer = Fig <1, 
2. Fy? + Fy? +--+ Fy? = FeF na: 
3. Ifn > 2, then (Fy)? — Fay iFa—1 = (—1)"*1. 


Proof. We will prove Part (3), leaving the rest to the reader in Exercise 6.4.6. 


(3). We use induction, using PMI-V3 with ky = 2. We see that (F))? — BF, = 
1? —2-1=—1=(-—1)**1, so the equation holds for n = 2. Now let n € N. Suppose 
that n > 3, and that the equation holds for all values in {2,...,n}. (Given that we 
already know that the equation holds for n = 2, it will suffice to prove that it holds 
for n > 3, and restricting to such values of n allows the following argument to work 
without special cases.) We compute 


(Fai)? — Fn42Fn = (Fr Bay (Fri Fu) Fn 
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= (Fy)? + 2FyFn—1 + (Fn—1)? — FriFn — (Fn)? 

= (F,- i + Fy, (2Fn—1 — Fr41) 

= (Fn—1)° + Fn(2Fn—1 — (Fn + Fn-1)) 

= (Fy-1)° + Fu(Fr—1 — Fr) 

= (Fy = FF, 2 = (—1)@-Y+! = (- 1) 


where the last line holds by the inductive hypothesis. 


Although the natural way to think of the Fibonacci numbers is in terms of Defini- 
tion by Recursion, it turns out that there is also an explicit formula for these numbers, 


which is ; 
na Fl(99)'-(99)" wa 
for all n € N. This formula, which is proved in Exercise 6.4.12 (4), is known as 
Binet’s formula (though it is attributed to Euler and Daniel Bernoulli in [GKP94, 
Section 6.6] and [Tho59, Chapter XIV]). For those familiar with the “golden ratio,” 
ys 


which equals and is often denoted @, observe that Binet’s formula is F,, = 


a5 19" = (+)’ ‘ for all n € N. See Exercise 6.4.14 for another relation between the 


Fibonacci numbers and the golden ratio. See [Hun70] for more on the golden ratio. 

We conclude this section with an even more complicated version of Definition 
by Recursion than the ones we have seen so far. We will need this additional version 
in the proof of Theorem 6.6.7, which is part of our discussion of countable sets. The 
reader might find the following definition and proof to be somewhat technical upon 
first reading, but hopefully will not be deterred from working through the proof, 
which uses a clever construction. 

The idea of this variation of Definition by Recursion is that we want to have 
each term of the sequence be dependent upon all the terms that came earlier in the 
sequence, not just the previous term, or the previous two terms, or any other fixed 
number of previous terms. In other words, we want to define a sequence c1,C2,C€3,... 
by specifying c;, and by specifying c,+; in terms of c),...,Cy, for each n € N. That 
is, we want cz to depend upon cj, and c3 to depend upon c; and co, and so on. The 
complication here is that there cannot be a single function to specify c,+1 in terms 
of c,,...,C, that works for all n € N, because any single function must have a fixed 
number of “variables.” To resolve this matter, we use the following definition. 


Definition 6.4.7. Let A be a set. Let G(A) be the set defined by 
= [J F({1,...,2},4). A 
n=1 


Theorem 6.4.8. Let A be a set, let b € A and let k: G(A) > A be a function. 
te there is a cet aa f: NA such that f(1) = b, and that f(n+1) = 
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Proof. We follow [Mun00, Section 8]. 

Uniqueness: Let s,t: N — A be functions. Suppose that s(1) = b and t(1) =), 
show that s(n) = t(n) for all n EN by induction on n, using PMI-V2 (Theo- 
rem 6.3.8). By hypothesis we know that s(1) = b = t(1). Next, let n € N and sup- 


yengltf lq Bgaasy 


pasate yh! ON bat asy 


Existence: There are three steps in the definition of f. 


Step 1. We will shown that for each p €N, there is a function hy: {1,...,p}—A 

The proof is by induction on p. First, let p= 1. Then {1,...,p} = {1}. Let 
hy: {1,...,1} A be defined by 4; (1) =b. Observe that {1,...,»—1} = {1,...,0} = 
@, and hence h(n +1) =k(hilq,,...n}) for alln € {1,...,p— 1} is necessarily true. 


Next, let p € N. Suppose there is a function hp: {1,...,p}—+A such that h,(1) =), 
and that hy(n+1)=k(hp|q1,...n}) forallne€ {1,...,p—1}.Lethpyi: {1,...,.p +1} 
A be defined by 


hits hp(n), ifne€ {1,...,p} 
pH k(hp), ifn=p+1. 


Then hy +1\|{1,....p} = Mp. It follows that hy,1(1) = Ap(1) = 4, that hpyi(n+1) = 


this step is then complete by PMI. 


Step 2. Let p,q € N. Suppose that p < q. We will show that hg(n) = hp(n) for all 
n€{l,...,p} by using Exercise 6.3.15. By Step 1 we know that hg(1) = b =h,(1). 
Next, suppose that n € {1,...,»—1} and that hg(j) = hp(j) for all j € {1,...,n}. 


Step 3. Let f: N —A be defined by f(n) = h,(n) for all n € N. Then f(1) = 
h(1) = bby Step 1. Let pe N. If j € {1,...,p}, then j < p+1, and it follows from 


see that f satisfies the desired properties. 


Exercises 


Exercise 6.4.1. Let r;,72,73,... be the sequence defined by r; = 1, and m4, =47%),+ 
7 for alln EN. Prove that r, = 4 (10-4""! —7) for all n EN. 


6.4 Recursion 219 


Exercise 6.4.2. Let b;,b2,b3,... be the sequence defined by b; = 1, and b2 = 1, and 
by = 5 (n-1 + 55) for all n € N such that n > 3. Prove that 1 < b, < 3 for all 
neEN. 


Exercise 6.4.3. Let d),d2,d3,... be the sequence defined by d; = 2, and dz = 3, and 
dn = dn—1-dy—2 for all n € N such that n > 3. Find an explicit formula for d,, and 
prove that your formula works. 


Exercise 6.4.4. [Used in Exercise 4.4.20.] Let A be a non-empty set, and let f: A— A 
be a function. Suppose that f is bijective. Prove that f” is bijective for each n € N. 


Exercise 6.4.5. For each n €N, find an example of a function f: A — A for some set 
A such that f” is a constant map, but f” is not a constant map for all r € {1,...,n—1}. 


Exercise 6.4.6. [Used in Proposition 6.4.6.] Prove Proposition 6.4.6 (1) (2). 
Exercise 6.4.7. [Used in Section 8.6.] Let n € N. 


(1) Prove that 2|F;, if and only if 3|n. 
(2) Prove that 3|F;, if and only if 4|n. 
(3) Prove that 4|F, if and only if 6|n. 


Exercise 6.4.8. [Used in Section 8.6.] Let n € N. Suppose that n > 5. Prove that 
Fy, = SFn—4 + 3 Fs. 


Exercise 6.4.9. [Used in Section 8.6.] Let n,k € N. Suppose that k > 2. Prove that 
each of the following holds. 


CM) Frege = FF + PPh. 
(2) Fy |Frn- 


Exercise 6.4.10. Define a sequence by specifying that G; = 1, that Gz = 1 and that 
Gni2 = Grit + Gy +Gy41Gy for all n € N. Prove that G, = 2’ — 1 for alln EN. 


Exercise 6.4.11. Letn € N. 


(1) Let @ = is and 6’ = ste = =. Prove that @” + @’” is an integer. 
(2) Prove that the integer 5(F,)? +4(—1)” is a perfect square. 


Exercise 6.4.12. [Used in Section 6.4.] The purpose of this exercise is to prove Binet’s 
formula (Equation 6.4.1). Let c,d € R. Suppose that c and d are non-zero, and that the 
equation x2 —cx —d = 0 has two distinct real solutions r; and ro. Let Ay ,A2,A3,--- 
be a sequence satisfying Ay+2 = cAy+1 +dAy, for all n € N. (By Theorem 6.4.5 there 
is a unique such sequence for each choice of A; and A2.) Let 
es r2A; — Ag nd OX rjA; —Az 
ri(r2—11) r2(r1 —1r2) 
(1) Let D;,D2,D3,... be the sequence defined by the explicit formula D, = 
P(r1)" + Q(r2)" for all n € N. Verify that D) = A; and Dz = Ad. 
(2) Prove that Dy+2 = cDn+1 +dD, for alln EN. 
(3) Use Theorem 6.4.5 to deduce that A, = D, for alln € N. 
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Apply Part (3) of this exercise to the Fibonacci sequence, and deduce Equa- 
tion 6.4.1. 


Exercise 6.4.13. We discuss a curious geometric puzzle; see [Wea38] for the history 
of this puzzle. Start with a square that has sides of length 13 units. Dissect the square 
into four pieces, as depicted in Figure 6.4.2 (i). The four pieces can be rearranged 
into a rectangle, as shown in Figure 6.4.2 (ii). Try making the puzzle out of paper, and 
doing the rearranging. The curious thing is that the area of the square is 137 = 169, 
whereas the area of the rectangle is 21-8 = 168. How can it happen that the same 
four pieces form shapes with different area? 


8 5 13 8 
8 13 
5 5 
13 
(i) (ii) 
Fig. 6.4.2. 


(1) 
(2) 


(3) 


Explain the puzzle by showing that there is a slight overlap among the pieces. 
We now generalize the above puzzle. Rather than starting with a square with 
sides of length 13 units, and breaking the sides up into pieces of length 8 and 
5, we start with an arbitrary square, and break its sides into pieces of lengths 
a and b. Find the only possible value for the ratio ¢ so that there is no overlap 
or underlap when the pieces are rearranged into a rectangle. 

We continue Part (2) of this exercise. Suppose that we want a puzzle with a 
and b both natural numbers (as is the case in the original puzzle). Because 
the areas of both the square and rectangle will be integers in this case, the 
difference of these areas, which is the amount of overlap or underlap, will also 
be an integer. Hence, with a and b both natural numbers, the minimal overlap 
will be +1. This minimal overlap is very hard to notice when the puzzle is 
made out of pieces of paper, which is why it fools people. A larger overlap or 
underlap would be much easier to spot. Prove that if a and b are consecutive 
Fibonacci numbers, then the overlap or underlap is minimal. Observe that 
the original puzzle did use consecutive Fibonacci numbers. (It can be shown, 
moreover, that no two natural numbers other than two consecutive Fibonacci 
numbers have the minimal overlap or underlap, though that requires a more 
difficult proof.) 
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Exercise 6.4.14. [Used in Section 6.4 and Section 7.8.] This exercise is for the reader 
who is familiar, at least informally, with limits of sequences. (We will discuss limits 
of sequences rigorously, albeit briefly, in Section 7.8; see any introductory text in real 
analysis, for example [Blo11, Chapter 8], for details.) We saw in Equation 6.4.1 that 


the Fibonacci numbers can be computed using the number @ = ie = 1.618... 
There is another relation between the Fibonacci numbers and @, which is seen by 
looking at successive ratios of Fibonacci numbers, that is, the numbers 

1235 8 13 


A calculation of the first few terms of this sequence shows that they appear to ap- 
proach the number 1.618..., which looks suspiciously like @, at least up to a few 
decimal places. In fact, it can be proved that 

lim “+! — g, 

n—-eoo n 
The proof of this equation has two parts: (1) that the limit exists and (2) that the 
limit equals @. The reader is asked to prove Part (2), assuming that Part (1) is true. 
(Proving Part (1) is more advanced, requiring a knowledge of Cauchy sequences and 
the completeness of the real numbers. See [Blol1, Example 8.4.10] for a detailed 
proof of Part (1).) 
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Intuitively, we know what it means to talk about the “size” of a finite set, and it 
seems intuitively clear that finite sets come in different sizes. What about infinite 
sets? Does it make sense to discuss the “size” of an infinite set, and if it does, do 
infinite sets come in different sizes? Galileo, writing in the early seventeenth century 
in [Gal74, pp. 38-47], thought that all infinite sets had the same size. Though he had 
some very good insights into infinite sets, even the brilliant Galileo was mistaken on 
this matter, as we shall see below. A correct understanding of the sizes of infinite 
sets was due to Cantor, the developer of set theory, two and a half centuries after 
Galileo. In the remaining sections of this chapter we will see a number of important 
arguments by Cantor; these ideas helped propel set theory into its prominent role in 
modern mathematics. 

How do we determine when two sets have the same size? It might appear at first 
glance that to answer this question we would need to be able to compute the size 
of each of the two sets before we could compare them, and the need for finding the 
“size” of an infinite set might seem to be an insurmountable obstacle if we want to 
compare the sizes of different infinite sets. It turns out, and this is a great insight, that 
it is possible to discuss whether two sets have the “same size” without first having to 
figure out the size of each set. 

We start with a simple example. Suppose that a group of people want to stay at 
a hotel, with each person in a separate room. The hotel manager will take the group 
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only if it completely fills up the hotel, and so it is necessary to figure out whether 
the right number of rooms are vacant. This is a very simple problem to solve, but 
there are in fact two ways to proceed. One way would be to count the number of 
people, and count the number of free rooms, and then see if the two numbers are the 
same. Another way would be to make a list of people, a list of free rooms, and then 
start going down the two lists, matching up each successive person with a distinct 
vacant room; if all the people and all the rooms are taken care of by this process, 
then everyone would be happy. The method of matching up people and rooms is 
cumbersome, but unlike counting, it has the advantage of working even if the number 
of people and the number of rooms are infinite. The method of counting, by contrast, 
works only when everything is finite. 

To determine whether two sets have the same size, we will try to pair up the 
elements of the two sets. Our tool for “pairing up” is bijective functions, as in the 
following definition. 


Definition 6.5.1. Let A and B be sets. The sets A and B have the same cardinality, 
denoted A ~ B, if there is a bijective function f: A — B. A 


Observe that Definition 6.5.1 refers only to whether two sets have the “same 
cardinality”; nothing is stated about the “cardinality” (which means size) of each of 
the two sets. Using bijective functions allows us to compare two sets, but not to say 
anything about each of the individual sets. 

If two sets have the same cardinality, then by definition there is a bijective func- 
tion from one to the other. Unless each of the two sets has only zero or one element, 
there will in fact be more than one such bijective function. When proving that two 
sets have the same cardinality, it is sufficient to find a single bijective function. 

The following lemma gives the basic properties of ~, which should look familiar. 


Lemma 6.5.2. Let A, B and C be sets. 


LA~A. 
2. IfA~B, thenB~A. 
3. IfA~BandB~C, thenA~C. 


Proof. See Exercise 6.5.3. 


Lemma 6.5.2 might lead the reader to think of ~ as an equivalence relation, but 
we need to proceed with caution here. If ~ were a relation, on what set would it be 
a relation? We might want to think of ~ as a relation on the set of all sets, because 
for any two sets A and B, it must be the case that either A ~ B or A & B. However, 
because of foundational problems such as Russell’s Paradox, which was discussed in 
Section 3.5, we avoid things such as the set of all sets. Hence, although ~ satisfies 
the three properties of an equivalence relation, it is not technically a relation on a set 
at all. If, however, all sets of interest are subsets of a given set X, then it is correct to 
say that ~ is an equivalence relation on P(X). 

We now have some examples of sets that have the same cardinality. 
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Example 6.5.3. 


(1) Though he made one major mistake concerning infinite sets (to be discussed 
shortly), Galileo understood the idea of using bijective functions (as we now call 
them) to show that two sets have the same cardinality. In the following quote from 
[Gal74, pp. 40-41], Galileo discusses some sets of positive natural numbers in a 
dialogue between two of his protagonists. 

Salviati. ... If I say that all numbers, including squares and non-squares, are 

more [numerous] than the squares alone, I shall be saying a perfectly true 

proposition; is that not so? 

Simplicio. One cannot say otherwise. 

Salviati. Next, I ask how many are the square numbers; and it may be truly 

answered that they are just as many as are their own roots, because every 

square has its root, and every root its square; nor is there any square that has 
more than just one root, or any root that has more than just one square. 

Simplicio. Precisely so. 

Salviati. But if I were to ask how many roots there are, it could not be denied 

that those are as numerous as all the numbers, because there is no number 

that is not the root of some square. That being the case, it must be said that 

the square numbers are as numerous as all numbers, because they are as 

many as their roots, and all numbers are roots. 

In modern terminology, Galileo states that the set of natural numbers N = 
{1,2,3,...} and the set of squares S = {1,4,9,16,...} have the same cardinality. 
Galileo’s argument is precisely the same as our modern one, which is that there is a 
bijective function h: N — S. The function / that Galileo suggests is the most natural 
one to use, namely, the function defined by h(n) = n? for all n € N. That / is bijective 
follows from the fact that k: S — N defined by k(n) = \/n for all n € S is an inverse 
of h, where we make use of Theorem 4.4.5 (3). 

(2) The set of natural numbers N and the set of integers Z have the same cardi- 
nality. One choice of a bijective function f: N — Z is the one defined by 


a ifn is even 
roy={* 


451, ifnis odd. 


It is left to the reader to verify that this function is bijective. 

(3) Let a,b,c,d € R. Suppose that a < b and c < d. We will show that [a,b] ~ 
[c,d], that (a,b) ~ (c,d), and similarly for half-open intervals. Let g: [a,b] — [c,d] 
be defined by 
d—c 
b—a 
for all x € [a,b]. It is straightforward to verify that the function g is bijective; we 
leave the details to the reader. It follows that [a,b] ~ [c,d]. A similar argument shows 
that (a,b) ~ (c,d), and similarly for half-open intervals; we omit the details. 

(4) Let a,b € R. Suppose that a < b. We will show that (a,b) ~ R. By Part (3) 
of this example we know that (a,b) ~ (—1,1). Hence, it is sufficient to show that 


a(x) = “(ea + 
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(—1,1) ~ R. Actually, we have already done all the work of proving that fact, be- 
cause in Exercise 4.4.3 there is an example of a bijective function f: R — (—1,1). 
(Instead of this function f, it is common to use the function h: (- wi ¥) — R defined 
by A(x) = tanx for all x € (—4, ¥), and then to use known properties of the tangent 
function to show that h is a bijective function. However, it is beyond the scope of this 
book to give a rigorous treatment of the tangent function, and so we have provided 


the more elementary function f.) .) 


For a better analysis of the cardinality of sets, we need to make various useful 
distinctions, such as finite sets vs. infinite sets. We have used the notion of finite- 
ness intuitively until now in this text, but we are now prepared to deal with this 
concept more precisely. The simplest approach to finiteness makes use of subsets of 
the natural numbers of the form {1,...,}; recall the definition of such sets given in 
Definition 6.2.6. More generally, observe in the following definition how important 
the set N is in understanding the cardinality of sets (this set is referred to directly or 
indirectly in the first four parts of the definition). 


Definition 6.5.4. 


1. A set is finite if it is either the empty set or it has the same cardinality as 
{1,...,n} for some n € N. 

2. A set is infinite if it is not finite. 

3. A set is countably infinite if it has the same cardinality as N. 

4. A set is countable (also called denumerable) if it is finite or countably infi- 
nite. 

5. A set is uncountable if it is not countable. A 


The reader is asked to prove in Exercise 6.5.5 that if A and B are sets such that 
A ~ B, and if A is finite, infinite, countably infinite, countable or uncountable, then 
so is B. We will use this simple fact, without explicitly mentioning it, throughout the 
rest of this chapter. 

It is evident that three of the types of sets described in Definition 6.5.4 in fact 
exist. There are finite sets, because the set {1,...,n} is finite for all n € N; there are 
countably infinite sets, because N is countably infinite; and there are countable sets, 
because there are countably infinite sets. On the other hand, it is not immediately 
evident whether there are infinite sets, and whether there are uncountable sets. We 
will show that there are uncountable sets shortly, but we first turn to the existence 
of infinite sets. The reader might think that this fact is self-evident, because we have 
already remarked that there exist countably infinite sets. However, the terms “count- 
ably infinite” and “infinite” were defined entirely separately, and it is not true simply 
by definition that a “countably infinite” set is in fact “infinite?” and so a proof is 
needed. The following lemma resolves this matter. 


Lemma 6.5.5. 


1. The set N is infinite. 
2. A countably infinite set is infinite. 
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Proof. 


(1). Suppose that N is finite. Because N ¥ 9, then there is some n € N such that 
N~ {1,...,}. Let f: {1,...,2} — N be a bijective function. It then follows from 
Theorem 6.3.11 (1) that there is some k € {1,...,2} such that f(k) > f(i) for any 
i€ {1,...,n}. Therefore f(k) +1 > f(i) for all ic {1,...,n}. Hence f(k) +1 ¢ 
f({1,...,2}). Because f(k) + 1 € N, we deduce that f is not surjective, which is a 
contradiction. Hence N is not finite, and so it is infinite. 


(2). Let B be a set. Suppose that B is countably infinite. Then B ~ N. Suppose 
further that B is finite. It would then follow from Exercise 6.5.5 that N is finite, which 
is a contradiction to Part (1) of this lemma. Hence B is infinite. 


From Part (1) of Lemma 6.5.5 we see that there are infinite sets. 

The one remaining question about Definition 6.5.4 is whether there are any un- 
countable sets. This issue is not at all trivial, and in fact it has fooled many great 
minds. For example, shortly after the quote from Galileo given above, Galileo con- 
tinues as follows. 


Salviati. 1 don’t see how any other decision can be reached than to say that 
all the numbers are infinitely many; all squares infinitely many; all their 
roots infinitely many; that the multitude of squares is not less than that of 
all numbers nor is the latter greater than the former. And in final conclusion, 
the attributes of equal, greater, and less have no place in infinite, but only 
in bounded quantities. So when Simplicio proposes to me several unequal 
lines, and asks me how it can be that there are not more points in the greater 
than in the lesser, I reply to him that there are neither more, nor less, nor 
the same number, but in each there are infinitely many. Or truly, might I 
not reply to him that the points in one are as many as the square numbers; 
in another and greater line, as many as all numbers; and in some tiny little 
[line], only as many as the cube numbers .... 


In this quote Galileo essentially says that all infinite sets have the same cardi- 
nality, which would make them all countably infinite in our terminology. In fact, we 
will see in Corollary 6.5.8 below that Galileo was wrong, and that there are indeed 
uncountable sets. To prove that corollary, we make use of the cardinality of the power 
set of a set. We start with an example. 


Example 6.5.6. Let A = {1,2}. Then (A) = {0, {1}, {2}, {1,2}}. Therefore A ~ 
P(A). 0 


The following theorem shows that Example 6.5.6 is typical. 
Theorem 6.5.7. Let A be a set. Then A % P(A). 


Proof. There are two cases. First, suppose that A = 0. Observe that P(A) = {0}, and 
therefore there cannot be a bijective function P(A) — A, because there cannot be a 
function from a non-empty set to the empty set. Hence P(A) 7% A. 
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Next, suppose that A 4 @. Suppose further that A ~ P(A). Then there is a bijective 
function f: A > P(A). Let D = {a € A | a ¢ f(a)}. Observe that D C A, and so 
D € ®(A). Because f is surjective, there is some d € A such that f(d) = D. Is d € 
D? Suppose that d € D. Then by the definition of D we see that d ¢ f(d) = D. 
Suppose that d ¢ D. Thend € f(d) = D. We therefore have a contradiction, and so 
Ax P(A). 


In the following proof we will use Theorem 6.6.5 (1) from Section 6.6, though 
there is no circular reasoning here, because the proof of Theorem 6.6.5 does not 
make use of the corollary we are about to prove. 


Corollary 6.5.8. The set P(N) is uncountable. 


Proof. By Theorem 6.5.7 we know that P(N) ~ N, and so #(N) is not countably 
infinite. If we could show that P(N) were not finite, then it would follow that it is not 
countable. Suppose that P(N) is finite. Let T = {{n} |n © N} C P(N). It follows from 
Theorem 6.6.5 (1) that T is finite. However, it is evident that T ~ N, and this would 
imply that N is finite, which is a contradiction to Lemma 6.5.5 (1). We conclude that 
(N) is uncountable. 


Corollary 6.5.8 is not entirely satisfying, because, even though its proof is short, 
it would be nice to see a more familiar and concrete set that is uncountable. In fact, 
we will see in Theorem 6.7.3 that the set R is uncountable. 

Putting all our results so far together, we deduce that any set is precisely one of 
finite, countably infinite or uncountable, and that there are sets of each type. 

We conclude this section with two important theorems concerning cardinalities 
of sets. The proofs of these theorems are much trickier than what we have seen so 
far in this section. We start with the following definition. 


Definition 6.5.9. Let A and B be sets. We say that A = B if there is an injective 
function f: A — B; we say thatA ~ BifA =< BandA + B. A 


Intuitively, if A < B, then A has “smaller size” than B. 

Some basic properties of the relation = are given in Exercise 6.5.11. It is simple 
to see that for any set A, there is an injective function A — P(A), and hence A = P(A); 
by Theorem 6.5.7 we see that A < (A). Applying this fact to the set N, and then to 
P(N), to P(P(N)) and so on, we deduce that 


N <9(N) x @((N)) < (2(P(N))) <-~ 


Because all the sets in this sequence other than the first are uncountable, we therefore 
see that there are infinitely many different cardinalities among the uncountable sets. 
A commonly used notation due to Cantor is the notation Xo, which denotes the 
cardinality of N. Observe that Xo is not a real number, though it is referred to as 
a “cardinal number.” Motivated by an observation made in Example 3.2.9 (2), it 
is common to denote the cardinality of P(N) by 2*°. (It is also possible to define 
cardinal numbers &1, %2,..., though we will not do so; see [Vau95, Section 7.5] for 
details.) 
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Our two theorems about cardinalities of sets are conveniently expressed using the 
concept of =. Although our notation (which looks suspiciously like <) might make 
it appear as if these results are trivial, in fact neither is trivial at all. 

Our first theorem, called the Schroeder—Bernstein Theorem (also known as the 
Cantor—Bernstein Theorem), not only has aesthetic appeal (by proving the analog for 
=< of the fact that if a < b and b < a, then a= 5, for all a,b € R), but it is quite useful 
as well, as we will see after the theorem. 


Theorem 6.5.10 (Schroeder—Bernstein Theorem). Let A and B be sets. Suppose 
thatA x< Band B=XA. ThenA~ B. 


The idea of the proof of the Schroeder—Bernstein Theorem, the bulk of which 
is contained in the proof of the following lemma, is as follows. Let A, B and C be 
sets. Suppose that C C B C A, and that A < C. We want to show that A ~ B. By 
definition there is an injective function g: A — C. We need to define a bijective 
function h: A — B. It would be tempting to define the function h by 

as ti ny E€A-B 
Xi ifxe B. 
However, although h is certainly surjective, and although the restriction of h to each 
of A — B and B is injective, it is not the case that h as a whole is injective, because 
there is overlap between g(A — B) and B. We would then want to modify h by letting 
it equal the identity on only some subset of B, which would hopefully eliminate the 
overlap, though without ruining surjectivity. 

More specifically, let X = g(A —B). A good guess at how to modify / would be to 
have / equal the identity on B— X, and have h equal g on (A — B) UX. Unfortunately, 
although there is no overlap anymore involving g(A — B) and B— X, we may have 
created a problem involving g(X) and B—X. We would then need to modify h again, 
this time having / equal the identity on B — X — g(X), and having h equal g on 
(A — B) UX Ug(X). This process never stops, but if we do not mind using an infinite 
process, which can be done using recursion, the function / can be defined in such a 
way that it is bijective. For convenience, the function / actually used in the proof of 
the following lemma looks a bit different from the above, but it is just another way 
of writing the same thing. 


Lemma 6.5.11. Let A, B and C be sets. Suppose that C C B CA, and that A =< C. 
ThenA~ B. 


Proof. By definition there is an injective function g: A > C. 

Let 7) = A— B. We now use Definition by Recursion (Theorem 6.4.1) to define 
a sequence of subsets of A by specifying T) = g(To), and 7,41 = g(T;,) for alln EN. 
This definition is valid, because we can think of this sequence of subsets of A as a 
sequence of elements of P(A). Let T = UF Tn. By Theorem 4.2.4 (6) we see that 


ary=e(U r) = LJe(t) = U) Tui = Um cr. 


n=0 n=0 n=0 n=1 


228 6 Finite Sets and Infinite Sets 


Also, observe that Jy C JT, which means that A — B C T, and it follows from Theo- 
rem 3.3.8 (4) (6) thatA —-T CA—(A—B) =B. 
Let h: A — B be defined by 


jie ifxeT 


x,  ifxeA—T. 


This definition makes sense because g(T) CC C Band A—T CB. 

We now show that h is bijective. Let x,y € A. Suppose that h(x) = h(y). Ifx,y ET, 
then h(x) = h(y) implies g(x) = g(y), and hence x = y by the injectivity of g. If 
x,y € A—T, then h(x) =h(y) implies x = y. If x € T and y € A—T, then h(x) = h(y) 
implies g(x) = y, which implies that y € g(T) C T, which is a contradiction, and 
hence this case is not possible. The case where x € A—T and y € T is similarly not 
possible. We conclude that h is injective. 

Let b € B. First, suppose that b € T. Because b ¢ A— B = Tp, then b € J; for 
some k € N. Hence b € g(T;_1), which means that b = g(z) for some z € T,_) CT. 
Because z € T, then b = h(z). Second, suppose that b € B—T. Then b € A—T, and 
hence h(b) = b. We conclude that / is surjective, and it follows that h is bijective. 


Proof Theorem 6.5.10 (Schroeder—Bernstein Theorem). By definition there are in- 
jective functions p: A — B and q: BA. Then p(A) CB, and q(p(A)) C q(B) CA. 
By Exercise 6.5.4 we know that g(p(A)) ~ A and g(B) ~ B. From the former it fol- 
lows that A < q(p(A)), and we then use Lemma 6.5.11 to deduce that A ~ q(B). 
Hence A ~ B. 


To obtain a more concrete understanding of the proof of the Schroeder—Bernstein 
Theorem (Theorem 6.5.10), the reader is asked in Exercise 6.5.12 to compute the sets 
To,7\,... and the function h in the proof of Lemma 6.5.11 for a simple example. 

The benefit of using the Schroeder—Bernstein Theorem is that there are cases 
where it is easier to find two injective functions than a single bijective function. 


Example 6.5.12. Let a,b € R. Suppose that a < b. We will use the Schroeder—Bern- 
stein Theorem (Theorem 6.5.10) to prove that [a,b] ~ (a,b). By Example 6.5.3 (3) 
we know that [a,b] ~ [—1,1] and (a,b) ~ (—1,1). Hence, it will suffice to prove 
that (—1,1) ~ [—1,1]. Let f: (—1,1) — [—1,1] be defined by f(x) =x for all x € 
(—1,1), and let g: [-1,1] — (—1,1) be defined by g(x) = 5 for all x € [-1,]]. 
Then both f and g are injective, and hence (—1,1) < [—1, 1] and [—1, 1] = (—1,1). 
The Schroeder—Bernstein Theorem now implies that [—1, 1] ~ (—1, 1), and therefore 
[a,b] ~ (a,b). 

Similar arguments, making use of the Schroeder—Bernstein Theorem together 
with Example 6.5.3 (4), show that any two of the following intervals have the same 
cardinality: [a,b], [a,b), (a,b], (a,b), [a,°¢), (a,2°), (—2,b], (—2°,b) and (—se,°°). 
The reader is asked to proved the details of one specific case in Exercise 6.5.13 (1). 

We mention that whereas the above proof that (a,b) ~ [a,b] is short, it is not 
entirely satisfying, because it would be nicer to see an explicit bijective function 
f: [a,b] — (a,b). The reader is asked to find such a function in Exercise 6.5.14, 
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though doing so is admittedly trickier than using the Schroeder—Bernstein Theorem. 


> 


The proof of the following theorem, called the Trichotomy Law for Sets, relies 
upon the Axiom of Choice. It is not just for convenience that we make use of this 
axiom, but it is a necessity, because the Trichotomy Law for Sets is in fact equivalent 
to the Axiom of Choice; see [Sto79, Section 2.9] or [RR85, Section I.3] for details. 
Rather than using the Axiom of Choice directly in this proof, we use Zorn’s Lemma 
(Theorem 3.5.6), which is equivalent to the Axiom of Choice, and is easier to use in 
the present situation. 


Theorem 6.5.13 (Trichotomy Law for Sets). Let A and B be sets. Then A = B or 
BAA. 


Proof. We need to show that there is an injective function f: A — B or an injective 
function g: B — A. If A or B is empty these functions exist trivially, so we will 
assume that A and B are both non-empty. 

A partial function from A to B is a function of the form f: J — B, where J CA. 
We can think of a partial function from A to B as a subset F C A x B such that for 
each a € A, there is at most one pair in F of the form (a,b). Hence, we can apply the 
concepts of subset and union to partial functions from A to B. 

Let # be the set of all injective partial functions from A to B. Observe that P # 0, 
because @ € #. Let C be a chain in ?. We claim that Ure-F € £. Suppose that 
(a,b), (a,c) € UrecC, for some a € A and b,c € B. Then (a,b) € G and (a,c) € H 
for some partial functions G,H € C. Because CC is a chain, we know that G C H 
or G > H. Without loss of generality assume that G C H. Then (a,b) and (a,c) 
are both in H, and because H is a partial function, then it must be the case that 
b=c. We conclude that Ure; F is a partial function from A to B. Next, suppose that 
(c,e),(d,e) € UrecC, for some c,d € A and e € B. A similar argument shows that 
(c,e) and (d,e) must both be in some K € C, and because K is an injective partial 
function, then it must be the case that c = d. We conclude that Ur; F is an injective 
partial function from A to B, and hence that Ure F € P. 

By Zorn’s Lemma (Theorem 3.5.6) the family of sets P has a maximal element. 
Let M € & be such a maximal element. Then M is an injective partial function from 
A to B. There are now three cases. First, suppose that for each a € A, there is a pair 
of the form (a,b) in M. Then M is an injective function A — B. Second, suppose that 
for each d € B, there is a pair of the form (c,d) € M. Then M is a bijective partial 
function from A to B, and using Exercise 4.4.13 (3) we see that the inverse function 
of M can be viewed as an injective function B — A. Third, suppose that neither of 
the previous two cases holds. Then there is some x € A such that there is no pair 
of the form (x,b) in M, and there is some y € B such that there is no pair of the 
form (a,y) € M. Let N= MU {(x,y)}. It is left to the reader to verify that N is an 
injective partial function from A to B, and hence that N € ®. Because M G N, we 
have a contradiction to the fact that Mis a maximal element of ?, and so this third 
case cannot happen. 
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Exercises 


Exercise 6.5.1. Prove that the set of all integers that are multiples of 5 has the same 
cardinality as the set of all integers. 


Exercise 6.5.2. Prove that the disk R* of radius 3 centered at (1,2) has the same 
cardinality as the unit disk in R? centered at the origin. 


Exercise 6.5.3. [Used in Lemma 6.5.2.] Prove Lemma 6.5.2. 


Exercise 6.5.4. [Used repeatedly.] Let A and B be sets, let X C A be a subset and let 
f: A—B bea function. Suppose that f is injective. Prove that X ~ f(X). 


Exercise 6.5.5. [Used repeatedly.] Let A and B be sets. Suppose that A ~ B. Prove 
that if A is finite, infinite, countably infinite, countable or uncountable, then so is B. 


Exercise 6.5.6. [Used in Theorem 6.7.4 and Exercise 6.7.1.] 


(1) Give an example of sets A, B and C such that A ~ BandAUC # BUC. 

(2) Let A, B and C be sets. Suppose that A ~ B and that ANC = 9 and BNC = 9. 
Prove that AUC ~ BUC. 

(3) Let A, B and C be sets. Suppose that AUC ~ BUC and that ANC = 0 and BN 
C = 9. Is it necessarily the case that A ~ B? Give a proof or a counterexample. 


Exercise 6.5.7. Let A and B be sets. Prove that A ~ B implies that P(A) ~ P(B). 


Exercise 6.5.8. [Used in Theorem 6.7.1.] Let A and F be sets. Suppose that F is 
finite, and that A is respectively finite, infinite, countably infinite, countable or un- 
countable. 


(1) Prove that A — F is respectively finite, infinite, countably infinite, countable 
or uncountable. 

(2) Prove that A U F is respectively finite, infinite, countably infinite, countable 
or uncountable. 


Exercise 6.5.9. Let A be a set, and let x be an element (not necessarily in A). Prove 
that A x {x} ~ A. 


Exercise 6.5.10. Let A, B, C and D be sets. Suppose that A ~ B and C ~ D. Prove 
that A x C~ Bx D. 


Exercise 6.5.11. [Used in Section 6.5.] Let A, B and C be sets. 


(1) Prove that 0 = A. 
(2) Prove that A = A. 
(3) Prove that if A < B and B XC, then A XC. 


Exercise 6.5.12. [Used in Section 6.5.] Let A = N, let B = {2,3,...} and let C = 
{2,3,...}. It is evident how to define a bijective function A — B, and we do not 
need the Schroeder—Bernstein Theorem (Theorem 6.5.10) to find such a function. 
However, in order to see a concrete example of how the proof of Lemma 6.5.11 (and 
hence of the Schroeder—Bernstein Theorem) works, the reader is asked to compute 
the sets 7o,71,... and the function h: A — B defined in the proof of Lemma 6.5.11 
using the function g: A — C defined by g(x) =x+5 forall x € A. 
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Exercise 6.5.13. Let a,b,c,d € R. Suppose that a < band c < d. Use the Schroeder— 
Bernstein Theorem (Theorem 6.5.10) to prove the following statements. 


(1) [Used in Example 6.5.12.] [a,b) ~ R. 
(2) Let X,Y CR be subsets. If (a,b) CX and (c,d) CY, thenX ~Y. 


Exercise 6.5.14. [Used in Example 6.5.12.] Let a,b € R. Suppose that a < b. We 
saw in Example 6.5.12 that [a,b] ~ (a,b). That proof was apparently brief, though 
the brevity was illusory, because the proof relied upon our work proving first 
Zorn’s Lemma (Theorem 3.5.6), and then the Schroeder—Bernstein Theorem (The- 
orem 6.5.10). Moreover, the proof in Example 6.5.12 did not explicitly exhibit a 
bijective function between the two intervals, and as such is not as concrete as possi- 
ble. 

In this exercise the reader is asked to prove that [a,b] ~ (a,b) by finding a bijec- 
tive function f: [a,b] — (a,b). Consider functions that are not continuous. 


Exercise 6.5.15. The proof of the Schroeder—Bernstein Theorem (Theorem 6.5.10), 
the bulk of which is found in the proof of Lemma 6.5.11, makes use of Definition by 
Recursion, and hence it makes use of the properties of the natural numbers. However, 
the statement of the Schroeder—Bernstein Theorem does not involve the natural num- 
bers, and it would be nice to have a proof of the theorem that does not involve any 
particular set of numbers. The purpose of this exercise is to provide such a proof of 
Lemma 6.5.11. This alternative proof is, unfortunately, even less transparent than the 
proof of the lemma found in the text. Essentially, the idea is to replace the recursive 
process with the notion of a fixed point, as defined in Exercise 4.2.15. 

Let A, B and C be sets. Suppose that C C B CA, and that A = C. Then there is an 
injective function f: A— C. 

(1) Let g: P(A) — P(A) be defined by g(X) = [B— f(A)] Uf(X) for all X € 
(A). It follows from Theorem 4.2.4 (4) that g is monotone, as defined in 
Exercise 4.2.15. We then use that exercise to deduce that there is some V € 
(A) such that g(V) = V. Prove that B = V U[f(A) — f(V)], and that VM 
[f(A) —f(V)] = 9. [Use Exercise 3.3.13 (1).] 

(2) Let h: A — B be defined by 


f(x), ifxeA-V 
h(x) = ; 
x: ifxEeV. 


Prove that this function is well-defined. More specifically, prove that h(x) € B 


for all x € A. [Use Exercise 4.4.11.] 
(3) Prove that the function / defined in Part (2) of this exercise is bijective. Con- 
clude that A ~ B. [Use Exercise 4.4.11.] 
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In Section 6.5 we looked at the general idea of sets having the same cardinality. We 
now give a more detailed look at two of the types of sets we saw in Definition 6.5.4, 
namely, finite sets and countable sets. 
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For sets in general, we did not assign a numerical value to each set that would 
be called the “size” of the set, because for infinite sets we would need to assign 
something other than real numbers (each of which is finite), and given what we have 
at our disposal in this text there are no other such “numbers” we can use. See [Pot04, 
Chapters 9 and 12] and [HJ99, Chapters 5 and 6] for discussion of cardinal and 
ordinal numbers, which are different types of “infinite numbers” that are relevant to 
the cardinality of sets. 

For finite sets, however, we can assign a number to each set that represents how 
many elements are in the set. In Section 3.2 we mentioned the notation |A| for the 
number of elements of a finite set A, and we subsequently used this notion repeatedly 
in an informal manner. We are now in a position to make this concept rigorous. 


Definition 6.6.1. Let A be a set. Suppose that A is finite. The cardinality of A, 
denoted |A], is defined as follows. If A = 9, let |A| = 0. If A 4, let |A] =n, where 
A~ {1,...,7}. A 


The alert reader might have noticed a potential problem with Definition 6.6.1. 
Could it happen that a set has the same cardinality as both {1,...,2} and {1,...,m} 
for two different natural numbers n and m? If that could happen, then Definition 6.6.1 
would make no sense. Our intuition tells us that this problem cannot occur, but that 
fact needs to be proved. Actually, we have already done the hard work of proving 
that fact in Theorem 6.3.11 (3), which immediately implies the following lemma, 
which we state without proof. 


Lemma 6.6.2. Letn,m € N. Then {1,...,n} ~ {1,...,m} ifand only ifn =m. 
We leave it to the reader to deduce the following corollary to Lemma 6.6.2. 


Corollary 6.6.3. Let A and B be sets. Suppose that A and B are finite. Then A ~ B if 
and only if |A| = |Bl. 


Although Corollary 6.6.3 seems rather obvious, it actually tells us something of 
real substance. Recall the problem of finding hotel rooms for people mentioned at 
the start of this section. We stated that there were two ways to compare the size of 
the set of people wanting to stay at the hotel and the size of the set of available hotel 
rooms: pairing up elements of the two sets, or counting the number of elements in 
each set and comparing the numbers. In the two approaches we compare different 
things, namely, sets in the first approach and numbers in the second. Corollary 6.6.3 
tells us that we will always obtain the same result by either method. 


Example 6.6.4. Let B = {1,4,9, 16}. We can formally show that |B] = 4 by showing 
that B~ {1,...,4} = {1,2,3,4}. To prove this last claim, let h: B — {1,...,4} be 
defined by h(x) = ,/x for all x € B. It is easy to verify that the function h is bijective. 
Needless to say, the use of a formal proof to demonstrate that |B| = 4 in this particular 
case is a bit of overkill, and we will not feel the need to give any more such proofs 
concerning the cardinalities of such finite sets. It is nice to know, however, that such 
proofs can be constructed. © 


We now see the most basic properties of the cardinalities of finite sets. The reader 
has probably used these properties many times without having had a second thought, 
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though of course the properties need to be proved. Additional properties of the car- 
dinalities of finite sets may be found in Sections 7.6 and 7.7. 


Theorem 6.6.5. Let A be a set. Suppose that A is finite. 
1. IfX CA, then X is finite. 
2. IfX CA, then |A| =|X|+|A—X|. 
3. IfX GA, then |X| < |Al. 
4. IfX GA, thenX LA. 


Proof. 
(1). This part of the theorem follows immediately from Theorem 6.3.11 (2). 


(2). If A—X = 9, then the result is trivial, so assume otherwise. Let n = |A]. Be- 
cause A —X #@, then A # @, and therefore n 4 0. Let f: A — {1,...,n} bea bijective 
function. We can then apply Theorem 6.3.11 (2) to the subset f(X) of {1,...,} to 
find a bijective function g: {1,...,2} > {1,...,n} such that g(f(X)) = {1,...,k} 
for some k € N such that k < n. By Lemma 4.4.4 (3) we see that go f is bijective. 
It then follows from Exercise 6.5.4 and Exercise 4.3.5 that X ~ {1,...,k}, which 
means that |X| = k. Using Exercise 4.3.5 again and Exercise 4.4.11, we see that 


(go f)(A—X) = (geo f)(A) — (go f)(X) = 8(F(A)) —8 F(X) 
= {l1,...,2}—{1,...,k} ={k+1,...,n}. 


It follows from Exercise 6.5.4 that A—X ~ {k+1,...,n}. By Exercise 6.2.2 (2) there 
is a bijective function from {k+1,...,n} to {1,...,2—k}. We deduce that A—X ~ 
{1,...,2—k}, and hence that |A — X| =n—k. The desired result now follows. 


(3). This part of the theorem follows from Part (2). 


(4). This part of the theorem follows from Part (3) and Corollary 6.6.3. 


The following result is a simple corollary to Theorem 6.6.5 (1); details are left to 
the reader. 


Corollary 6.6.6. Let A be a set. Then A is infinite if and only if it contains an infinite 
subset. 


Theorem 6.6.5 (4) might seem trivial, but it should not be taken for granted, 
because it does not hold for all sets. For example, the set of natural numbers N is a 
proper subset of Z, and yet we saw in Example 6.5.3 (2) that N ~ Z. In fact, as we 
will see in Theorem 6.6.12 below, Theorem 6.6.5 (4) completely characterizes finite 
sets. In order to prove this characterization, however, we need to learn more about 
countable sets, and it is to that topic that we now turn. 

Formally, a set is countably infinite if it has the same cardinality as N. Intuitively, 
that means that a set is countably infinite if its elements can be “lined up” in some 
order, so that the set has a first element, a second element and so on, in the same way 
that the elements of N are lined up. As an example of how to line up the elements of a 
countably infinite set, recall Example 6.5.3 (2), in which we saw a bijective function 
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f: N— Z, which showed that Z is countably infinite. If we think of the integers in 
increasing order, which we standardly write as .... —2, —1, 0, 1, 2, ..., then there 
is no obvious “first integer,” “second integer” and so on. However, we can use the 
bijective function f to line up the integers in an alternative way. Because the function 
f is bijective, we know that the sequence f(1), f(2), f(3), f(4), ... contains each 
integer once and only once. Using the definition of the function f, we see that f(1), 
f(2), f(3), f(4), ... equals 0, 1, —1, 2, —2, ..., and we therefore have the entire 
set of integers nicely arranged in a way that has a first element, second element and 
so on. Of course, this arrangement of the integers is not in order of increasing size, 
but it would be too much to expect that. We saw in Corollary 6.5.8 that there are 
uncountable sets, which means that there are infinite sets that cannot be lined up so 
that there is a first element, a second element and so on. 

For our first result about countable sets, recall that Theorem 6.6.5 (1) stated that 
a subset of a finite set is finite. The following theorem shows that the analogous 
result is true for countable sets as well. This theorem shows why it is often useful 
to work with the broader concept of countable sets, rather than the narrower concept 
of countably infinite sets, because the theorem would not be true if we replaced 
“countable” with “countably infinite’; observe that a subset of a countably infinite 
set need not be countably infinite, because it can be finite. 

The intuitive idea of the proof of the theorem is as follows. The interesting case in 
the proof is when X is an infinite subset of N. Because X is non-empty, we can apply 
the Well-Ordering Principle (Theorem 6.2.5) to find some c; € X such that c; < x for 
all x € X. Then cy < x for all x € X — {c,}. Let X2 = X — {c1}. Because X is infinite, 
then X2 4 0, and by the same argument as before there is some cz € X2 such that 
co <x for all x € Xz — {co} =X — {c1,c2}. Let X3 = X — {c1,c2}. We can continue 
this process forever, and we therefore obtain a sequence c1,C¢2,c3,... in X such that 
cy <c2 <c3 <---. The rest of the proof consists of showing that this sequence in 
fact contains all the elements of X. Because all the elements of X can be lined up 
C1,C€2,C€3,--., and because X is infinite, it follows that X is countably infinite. How- 
ever, the phrase “we can continue this process forever” is not rigorous, and we make 
this proof rigorous by using the version of Definition by Recursion given in Theo- 
rem 6.4.8. The function f: N — A in the proof replaces the sequence c1,c2,C3,.... 
Before proceeding, the reader should review the notation given in Definition 6.4.7. 


Theorem 6.6.7. Let A be a set. Suppose that A is countable. If X CA, then X is 
countable. 


Proof. We follow [Mun00, Section 7]. Let X CA. 

If A is finite, then by Theorem 6.6.5 (1) we know that X is finite, and hence it is 
countable. Now assume that A is countably infinite. We will prove the theorem for 
the special case that A = N. For the general case, we observe that if A is countably 
infinite, then there is a bijective function f: A — N, and the desired result follows 
from the fact that X ~ f(X), which holds by Exercise 6.5.4, and that f(X) is a subset 
of N. 

Suppose that A = N. If X is finite, then it is countable by definition, and there is 
nothing to prove. Now suppose that X is infinite. 
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By the Well-Ordering Principle (Theorem 6.2.5), there is a unique element b € 
X such that b < x for all x € X. Let k: G(X) — X be defined as follows. Let g € 
G(X). Then g € F({1,...,n},X) for some n € N, which means that g is a function 
{1,...,2} — X. It follows from Exercise 6.6.3 that g cannot be surjective, and hence 
X — g({1,...,n}) 40. Using the Well-Ordering Principle again we see that there is a 
unique element z, € X — g({1,...,n}) such that zp < x for all x € X — g({1,...,n}). 
We then let k(g) = Z,. 

We can apply Theorem 6.4.8 to b and ‘ as above, and we deduce that there is 


allie. nl | n}) =X f({I,...n}), and so f(v+1) <y for all ye 
xf 

Let re N. Then f(r) < y for all y € X — f({1,...,r—1}), where we think of 
{1,...,0} as the empty set when r = 1. Because f(r+1) © X — f({1,...,r}) C 
X — f({1,...,r—1}), it follows that f(r) < f(r +1). By Exercise 6.3.4 we see that 
f(n) >n for alln EN. 

We now show that f is bijective. Let i, 7 € N. Suppose that i £ j. Without loss 
of generality assume that i < j. Then i < j—1, and also j > 1, so that 7-1 EN. 
It follows that f(i) € f({1,...,j—1}), and as observed above we know that f(j) € 
X — f({1,...,7—1}). Therefore f(i) 4 f(j), and we deduce that f is injective. 

Let m € X. Suppose that m 4 f(p) for any p € N. Using a previous observation 
we know that m < f(m), and hence m < f(m). On the other hand, we saw above that 
f(m) <y for all y € X — f({1,...,m—1}). By hypothesis on m we know that m ¢ 
f({1,...,m—1}), and it follows that f(m) < m, which is a contradiction. Therefore 
f is surjective. 

We conclude that f is bijective, which implies that X ~ N. Hence X is countably 
infinite, and therefore countable. 


In order to show that a set is countable, it is necessary to show that it is either 
finite or countably infinite. It will be convenient to unify these two cases via the 
following theorem, which also has the advantage of allowing us to show only that a 
function is injective or surjective, rather than having to show that it is bijective. We 
will use this theorem in subsequent proofs. 


Theorem 6.6.8. Let A be a non-empty set. The following are equivalent. 


a. The set A is countable. 
b. There is an injective function f : A > N. 
c. There is a surjective function g: NA. 


Proof. 


(a) => (b). Suppose that A is countable. There are two cases, depending upon 
whether A is finite or countably infinite. If A is finite, there is a bijective function 
k: A= {1,...,n} for some n € N, and hence there is an injective function k:A— 
N, because {1,...,2} CN. If A is countably infinite, there is a bijective function 
h: X — N, which is injective. 
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(b) = (a). Suppose that there is an injective function f: A — N. Because f is 
injective, it follows from Exercise 6.5.4 that A ~ f(A). By Theorem 6.6.7 we know 
that f(A) is countable, and therefore A is countable. 


(b) = (c). Suppose that there is an injective function f: A — N. By Theo- 
rem 4.4.5 (2) the function f has a left inverse, say g: N — A. By Exercise 4.4.13 (1) 
we see that g is surjective. The other implication is proved similarly, and we omit the 
details. 


Are unions, intersections and products of countable sets always countable? The 
answer is yes for intersections, as seen in Theorem 6.6.9 (1) below, but not always 
for unions and products. For example, let / be a non-empty set, and let {A;},-, be 
a family of sets indexed by J. If J is uncountable, and if each A; has at least one 
element that is not in any other set Ax, then U;-;A; will have a subset that has the 
same cardinality as J, and hence it is uncountable by Exercise 6.6.7. Also, the set 
{0,1} is countable, and yet {0, 1}‘ is uncountable, as can be seen using the comment 
after Definition 4.5.7 together with Exercise 6.7.7. 

The best we can do regarding unions and products of countable sets is seen in the 
following two theorems. To form an intuitive picture of why the union of countably 
many countable sets is itself countable, consider first the union of two countable sets 
A and B. We can line up each of their elements as aj,a2,... and bj,b2,.... We can 
then line up the elements of AUB as a,b,,a2,b2,..., where we drop any element 
that is the same as an element previously listed (which could happen because A 
and B might have elements in common). A picture for the union of countably many 
countable sets is seen in Figure 6.6.1, where the elements of the union are “lined up” 
in the order shown by the arrows, and where again we drop any element that is the 
same as an element previously listed. 


Fig. 6.6.1. 


Theorem 6.6.9. Let I be a non-empty set, and let {A;},-, be a family of sets indexed 
by I. Suppose that A; is countable for each i € I. 
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L. (\jeyAi is countable. 
2. If I is countable, then je; Ai is countable. 


Proof. 


(1). Choose some k € J. Then ();<;Aj C Ag, and hence ()j<;A; is countable by 
Theorem 6.6.7. 


(2). If Aj = @ for all i € J, then Uj-;Ai = 8, which implies that );-;A; is finite, 
and hence countable. Now assume that A; 4 0 for some k € J. Because the empty set 
contributes nothing to a union of sets, the set Uj<,Ai will not be changed if we delete 
from J those elements s € J such that A, = 0. Let us assume that that has been done, 
and therefore that A; 4 @ for all i € J. 

There are two cases, depending upon whether J is countably infinite or is finite. 
We prove the former case, leaving the other case to the reader in Exercise 6.6.12. 
Because we are assuming that J is countably infinite, without loss of generality we 
may assume that J = N. 

Because A; is countable for all i € J, then by Theorem 6.6.8 there is a surjective 
function f;: N — A; for eachi € J. Let g: N — U;<,Ai be defined as follows. Let r € 


N. We can apply Exercise 6.3.14 to the function f: N — N defined by f(n) = (nin 


for all n € N, and we deduce that there are unique n, p € N such that (n-tn <r< 


mOtY and r= SY" + p, Let g(r) = fa-pui(P)- 


Let x € Ujey Ai. Then x € Ay for some k € J. Because f; is surjective, there is 
some w € N such that x = f;(w). Let t = k+w-— 1. The reader can then verify that 
g( (ue +w) = fr_wi1(w) = f(w) =x. Therefore g is surjective, and it follows from 
Theorem 6.6.8 that Uj<,;A; is countable. 


Observe that in the proof of Theorem 6.6.9 (2), we simultaneously had to choose 
a surjective function f;: N — A; for each i € J; there really is a choice to be made, 
because there is more than one such function for each i € J (except when A; has 
only one element in it). Hence, we are making use of the Axiom of Choice (The- 
orem 4.1.5). To use that axiom formally in this proof, we would let 5; denote the 
set of all surjective functions N — A; for each i € J, and we would apply the Ax- 
iom of Choice to the family of sets {5;};<;; we omit the details. It is pointed out in 
[Vau95, p. 56] that any proof of Theorem 6.6.9 (2) requires the Axiom of Choice. 
Theorem 6.6.10. Let A,,...,A, be sets for some n € N. Suppose that A,,...,An are 
countable. Then A, x ++: X Ay is countable. 


Proof. The result is trivial when n = 1. In Exercise 6.6.8 there is a proof of this result 
for the case n = 2. The general result follows by induction on n; the details are left 
to the reader. 


Infinite sets come in different cardinalities, for example countable and uncount- 
able. As remarked after Definition 6.5.9, among the uncountable sets there are sets 
of different cardinalities. Among all the different types of infinite sets, it would seem 
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intuitively plausible that countably infinite sets are the “smallest.” We now have the 
tools to prove this fact. 


Theorem 6.6.11. Let A be a set. IfA is infinite, then A has a countably infinite subset. 


Proof. Suppose that A is infinite. By the Trichotomy Law for Sets (Theorem 6.5.13) 
we know that N= A orA XN. 

First, suppose that N = A. Then there is an injective function f: N — A. By 
Exercise 6.5.4 we know that N ~ f(N). Hence f(N) is a countably infinite subset of 
A. 

Second, suppose that A = N. Then there is an injective function g: A — N. By 
Exercise 6.5.4 again we know that A ~ g(A). Because g(A) CN, it follows from The- 
orem 6.6.7 that g(A) is countable. Hence A is countable. Because A is infinite, then 
it must be countably infinite, and hence A has a countably infinite subset, namely, 
itself. 


We conclude this section with the following promised characterization of finite 
sets, for which we now have the necessary tools. 


Theorem 6.6.12. Let A be a set. Then A is finite if and only if A has no proper subset 
with the same cardinality as A. 


Proof. Suppose that A is finite. Let X g A. Theorem 6.6.5 (4) implies that X ~% A. 
Suppose that A is infinite. Then by Theorem 6.6.11 we know that A has a count- 

ably infinite subset. Let X C A be countably infinite. By Exercise 6.6.6 there is a 

function f: X — X that is injective but not surjective. Let g: A — A be defined by 


_ jfla), ifaex 
a)=) tac AX. 


It is left to the reader to verify that g is injective but not surjective. Because g is 
injective it follows from Exercise 6.5.4 that A ~ g(A), and because g is not surjective 
we see that g(A) GA. 


The proof of Theorem 6.6.12 may be short, but it is not at all trivial, because 
it uses Theorem 6.6.11, which in turn uses the Trichotomy Law for Sets (Theo- 
rem 6.5.13), which in turn relies upon the Axiom of Choice. 

The characterization of finite sets in Theorem 6.6.12 is quite nice, because it 
does not make any reference to the natural numbers. Some authors in fact take this 
property as the definition of finiteness, and deduce our definition. An alternative way 
of stating this characterization of finiteness is that if A is a finite set, then a function 
f: A—A is bijective if and only if it is injective if and only if it is surjective. The 
reader is asked to prove this fact in Exercise 6.6.4. For an infinite set B, by contrast, 
a surjective or injective function g: B — B need not be bijective; an example of an 
injective function that is not surjective is used in the proof of Theorem 6.6.12, and 
any left inverse of such a function would be surjective but not injective. 
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Exercises 


Exercise 6.6.1. [Used in Theorem 7.6.7.] Let A and B be sets. Suppose that A and B 
are finite. Prove that A UB is finite. 


Exercise 6.6.2. [Used in Lemma 8.2.2.] Let A C N be a subset. Suppose that there is 
some M € N such that a < M for all a € A. Prove that A is finite. 


Exercise 6.6.3. [Used in Theorem 6.6.7.] Let A be a set. Prove that A is finite if and 
only if there is an injective function f: A > {1,...,n} for some n € N if and only if 
there is a surjective function f: {1,...,n} > A for somen EN. 


Exercise 6.6.4. [Used in Section 6.6 and Theorem 7.7.4.] Let A and B be sets, and 
let f: A — B be a function. Suppose that A and B are finite sets, and that |A| = |B]. 
Prove that f is bijective if and only if f is injective if and only if f is surjective. 


Exercise 6.6.5. [Used in Section 8.2.] Let F C N be a set. Suppose that F is finite 
and non-empty. Use Theorem 6.3.11 (1) to prove that there is some k € F such that 
p<kforallp€F. 


Exercise 6.6.6. [Used in Theorem 6.6.12.] Let X be a set. Suppose that X is countably 
infinite. Prove that there is a function f: X — X that is injective but not surjective. 


Exercise 6.6.7. [Used in Section 6.6 and Exercise 6.7.7.] Let A be a set. Prove that A 
is uncountable if and only if it contains an uncountable subset. 


Exercise 6.6.8. [Used in Theorem 6.6.10.] Let A and B be sets. Suppose that A and B 
are countable. Prove that A x B is countable. 


Exercise 6.6.9. Let A and F be sets. Suppose that A is countably infinite set, and that 
F is finite and non-empty. 


(1) Prove that A x F is countably infinite by constructing an explicit bijective 
function A x F — N. Suppose that F ~ {1,...,n} for some n © N. First, 
find a bijective function g: N x {1,...,n} — N, using the Division Algo- 
rithm (Theorem A.5 in the Appendix). Second, use the function g to define a 
bijective function f: A x F — N. Prove that the functions you have defined 
are bijective. 

(2) Prove that A x F is countably infinite by using results in the text, but without 
finding an explicit bijective function A x F — N. 


Exercise 6.6.10. Let A be an uncountable set, and let T be any non-empty set. Prove 
that A x T is uncountable. 


Exercise 6.6.11. Let A and B be sets. 


(1) Let f: A — B be a function. Suppose that f has a left inverse but no right 
inverse. Prove that if A is infinite, or if B— f(A) is infinite and A has at least 
two elements, then f has infinitely many left inverses. 

(2) Let k: A — B be a function. Suppose that k has a right inverse but no left 
inverse. Let 
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S={b €B|k '({b}) has more than one element}. 


Prove that if S is infinite, or if k~'({r}) is infinite for some ft € S, then k has 
infinitely many right inverses. 


Exercise 6.6.12. [Used in Theorem 6.6.9.] Prove Theorem 6.6.9 (2) in the case that [ 


is finite. Without loss of generality we may assume that J = {1,...,s}forsome s € N. 
The fact that the result has been proved in the case where / is countably infinite can 
be used in the proof of the finite case. [Use Exercise 6.3.14.] 


6.7 Cardinality of the Number Systems 


In this section we use the results of the previous sections of this chapter to discuss 
the cardinality of the standard number systems, which are the natural numbers, the 
integers, the rational numbers, the real numbers and the complex numbers. Of course, 
the set N is countably infinite by definition. We know by Lemma 6.5.5 (1) that N 
is infinite. Because all the other number systems under discussion contain N, they 
are all infinite by Corollary 6.6.6. The question is then determining which number 
systems are countable and which are uncountable. 

We saw in Example 6.5.3 (2) that the set Z is countably infinite. If we think of the 
set of real numbers as forming the “number line,” we then view the integers as sitting 
discretely in R, that is, there are gaps between the integers. The rational numbers, 
by contrast, are “dense” in R, in that between any two real numbers, no matter how 
close, we can always find a rational number. A proof of this fact is beyond the scope 
of this book; see [Blol1, Theorem 2.6.13] for details. It therefore might appear that 
there are “more” rational numbers than integers. The following theorem shows that 
our intuition here is deceiving. 


Theorem 6.7.1. The set Q is countably infinite. 


Proof. We have just remarked that the set Z is countably infinite, and hence it is 
countable. Let Z* = Z — {0}. It follows from Exercise 6.5.8 (1) that Z* is also count- 
able. By Theorem 6.6.10 we know that Z x Z* is countable, and it follows from 
Theorem 6.6.8 that there is a surjective function g: N— Zx Z*. Let f: Zx Z* + Q 
be defined by f((m,n)) = for all (m,n) € Z x Z*. Given that Q consists of all 
fractions, it is evident that f is surjective. By Lemma 4.4.4 (2) we see that fog isa 
surjective function N — Q. Hence Q is countable by Theorem 6.6.8. Because Q is 
infinite, as previously remarked, it is therefore countably infinite. 


Theorem 6.7.1 tells us that in principle the elements of Q can be “lined up” in 
some order, so that Q has a first element, a second element and so on, in the same 
way that the elements of N are lined up, although this lining up of the elements of 
Q will not necessarily be according to increasing size. However, the proof of the 
theorem does not tell us explicitly how to line up the elements of Q. In Figure 6.7.1 
we see a diagram, due to Cantor, that summarizes a well-known way of lining up the 
positive rational numbers: follow the path indicated by the arrows, and drop every 
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fraction that is equal to one that has already been encountered. (An alternative way to 
line up the positive rational numbers is given in Exercise 6.7.9; this approach is a bit 
trickier than Cantor’s method, but it has the aesthetic appeal of never encountering 
any number twice, and therefore avoiding the need to drop repeated numbers as in 
Cantor’s method.) 


eee os 


re 
N[n 
— Wiln 
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nN 


Fig. 6.7.1. 


Another set of numbers that is countable, and which is even larger than Q, is the 
set of algebraic numbers, which is the set of all roots of polynomials with rational 
coefficients. (We are referring to real roots here, so that the set of algebraic numbers 
is a subset of R.) Every rational number is algebraic, but there are also many irra- 
tional numbers that are algebraic, for example J 2, which is a solution of the equation 
x” —2=0. There are also many non-algebraic numbers (called transcendental num- 
bers), for example 7 and e, though it is not trivial to prove that these numbers are not 
algebraic; see [Her75, Section 5.2] for details. 


Theorem 6.7.2. The set of algebraic numbers is countably infinite. 


Proof. Left to the reader in Exercise 6.7.3. 


We now turn to the set of all real numbers, which is our first concrete example of 
an uncountable set. (We already saw an uncountable set in Corollary 6.5.8, but that 
set is not as familiar as R.) The proof that IR is uncountable was a major breakthrough 
due to Cantor. We follow his proof, often referred to as “Cantor’s diagonal argument.” 
For this proof we will need to use the fact that every real number can be expressed as 
an infinite decimal, and that this decimal expansion is unique if decimal expansions 
that eventually become the number 9 repeating are not allowed. The proof of this fact 
is beyond the scope of this book; see [Blol1, Section 2.8] for details. The rational 
numbers can be shown to be precisely those real numbers with decimal expansions 
that are either repeating, or are zero beyond some point. 
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Theorem 6.7.3. The set R is uncountable. 


Proof. Suppose to the contrary that R is countable. Because R is infinite, as al- 
ready observed, it must be countably infinite. From Example 6.5.3 (4) we know 
that (0,1) ~ R, and hence (0,1) must be countably infinite. Let f: N — (0,1) 
be a bijective function. For each n € N, we can write f(m) as an infinite decimal 
f(n) = 0.a).a2.a3 ..., where the numbers a},a?,a3,... are integers in {0,1,...,9}, 


and where the expansion does not eventually become the number 9 repeating. 


For each k €N, let 
1, ifafAl 
b= age 
2, ifak=1. 


Observe that by A ak for all k € N. Let b be the number represented by the decimal 
expansion b = 0.b; b7 b3 .... Because by #9 for all k € N, then this decimal expan- 
sion corresponds to a unique number in (0,1). We claim that b 4 f(n) for alln EN. 
The decimal expansion of any real number is unique if it does not become the num- 
ber 9 repeating, and therefore if two numbers have different such decimal expansions 
(even if the difference is by only one digit) then the two numbers are not equal. For 
each n € N, the n-th digit in the decimal expansion of f(n) is a, whereas the n-th 
digit in the decimal expansion of b is b,. Hence b 4 f(n) for all n € N. We have 
therefore reached a contradiction to the surjectivity of f, and we deduce that R is not 
countable. 


The proof of Theorem 6.7.3 is referred to as “Cantor’s diagonal argument” be- 
cause of the shaded line in Figure 6.7.2, which is the set of numbers that are modified 
in order to define the number b = 0.b, b2 b3.... 


fl) = Oa at at aj 
f(2) = 0. a) “aj a aj 
f(3) = 0. a a3 ‘aj aj 
f(4) = 0. ay ag ay Nag 


Fig. 6.7.2. 


Although Cantor’s diagonal argument is a very nice proof, it is not quite as sim- 
ple as it might at first appear, because to put in all the details, it would be necessary to 
prove that every real number can be represented by an infinite decimal, and that this 
decimal representation is unique if decimal representations that eventually become 
the number 9 repeating are not allowed; unfortunately, such a proof is more diffi- 
cult than might be expected, in part because it relies upon the Least Upper Bound 
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Property of the real numbers. This property of the real numbers is discussed very 
briefly in Example 7.4.11 (2), though a full discussion awaits the reader in a course 
on real analysis; see, for example, [Blol1, Section 2.6]. This reliance upon the dec- 
imal representation of real numbers not only makes the Cantor diagonal argument 
more difficult than it at first appears, but it is also somewhat bothersome in that the 
decimal representation of real numbers, while extremely useful from a computational 
perspective, is not conceptually at the heart of the real numbers. Hence, it would be 
nice to have a proof of the uncountability of the set of real numbers that is more 
directly related to the fundamental properties of these numbers. Such a proof, also 
due to Cantor (in fact prior to his diagonal argument), can be found in [Blol 1, The- 
orem 8.4.8]; this proof, which makes use of sequences, is a special case of a more 
general theorem in topology, seen in [Mun00, pp. 176-177]. 

Theorem 6.7.3 tells us that R 7 N. There is, in fact, a much more precise rela- 
tion between the cardinalities of R and N, which is that R ~ P(N). A proof of this 
fact, making use of the Schroeder—Bernstein Theorem (Theorem 6.5.10), is given in 
Exercise 6.7.8. 

The set of irrational numbers is defined to be the set of all real numbers that are 
not rational, that is, the set IR — Q. There does not seem to be standard notation for 
the set of irrational numbers; we will use IRR. The set IRR is uncountable, for if 
not, then the real numbers would have to be a countable set as well by Theorem 6.7.1 
and Theorem 6.6.9 (2), because R = QUIRR. The following theorem shows that in 
fact IRR ~ R, which should not be taken as obvious, because not every uncountable 
set has the same cardinality as R; for example ?(R) ~ R by Theorem 6.5.7. 


Theorem 6.7.4. The set of irrational numbers has the same cardinality as R. 


Proof. We follow [Ham82]. Let P = {/2,2V2,3V2,...}. We know that /2 € IRR 
by Theorem 2.3.5. Using Exercise 2.3.4 it follows that all other members of P are 
also in IRR, and therefore P C IRR. It is straightforward to verify that P is countably 
infinite; we omit the details. By Theorem 6.7.1 and Theorem 6.6.9 (2) we see that 
QUP is countable, and by Lemma 6.5.5 (2) and Corollary 6.6.6 we see that QU P is 
countably infinite. Hence QUP ~ P. We now use Exercise 6.5.6 (2) to see that 


R = QUIRR = QU[PU (IRR — P)] = (QUP) U(IRR — P) 
~ PU (IRR — P) = IRR. 


The following theorem is, once again, slightly counterintuitive. 


Theorem 6.7.5. Letn € N. Then R" ~ R. 


Proof. The fact that R* ~ R follows immediately from Exercise 6.7.4 (2). The proof 
that IR” ~ R for arbitrary n is by induction on n; the details are left to the reader. 


In Exercise 6.7.5 it is seen that the set of complex numbers C also has the same 
cardinality as R. 

We now turn to a rather curious issue concerning the cardinalities of sets of num- 
bers. Using the notation defined in Section 6.5, we know that N < R, because the 
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inclusion function i: N — R is injective. From Theorem 6.7.3 we know that N ~ R. 
Is there a set X such that N < X < R? Cantor conjectured that there was no such set, 
and this conjecture is known as the Continuum Hypothesis. You might be tempted to 
try to look for such a set yourself, but you will not succeed. Nor, amazingly enough, 
will you succeed in proving that no such set exists. Due to the remarkable work of 
Kurt Godel in 1938 and Paul Cohen in 1963, it turns out that the Continuum Hypoth- 
esis is independent of the Zermelo—Fraenkel Axioms for set theory (see Section 3.5 
for a discussion of these axioms). In other words, the Continuum Hypothesis can nei- 
ther be proved nor disproved from the Zermelo—Fraenkel Axioms. See [Mal79, Sec- 
tion 1.12] or [Vau95, Section 7.7] for further discussion (though the proof of Cohen’s 
result is to be found only in more advanced texts). It follows that we either need to 
be satisfied with not being able to resolve the Continuum Hypothesis, or we need to 
find new axioms for set theory. Mathematicians have stuck to the standard axioms, 
because they have worked well so far, and therefore have decided to live with the 
odd situation regarding the Continuum Hypothesis. 

We conclude this section with an application of cardinality to computer science. 


Example 6.7.6. There are many general-purpose computer programming languages, 
such as Pascal, C++, Java, Haskell and Prolog, each with its particular features and 
conceptual approach. See [Set96] for a discussion of programming languages. Com- 
mon to all these programming languages is that a program consists of a list of in- 
structions, written using various code words and symbols that the programmer can 
understand, and which is then translated by the computer into machine operations. 
For example, a very short program in Haskell is 


binom:: Integer -> Integer -> Integer 
binom n 0 = 1 
binom n k = if k == n then i 
else binom (n - 1) k + binom (n - 1) (k - 1) 


(This program calculates binomial coefficients recursively; see Section 6.4 for a dis- 
cussion of Definition by Recursion, and Section 7.7 for a discussion of binomial 
coefficients. See [Hud00] for details about Haskell.) 

What do computer programs do? Fundamentally, they cause the computer to take 
various input data (which could be the empty set), and for each possible input, pro- 
duce some output data. Is there a programming language in which we could write 
sufficiently many different programs so that we could make the computer do any 
possible thing we might wish it to do? If not, then there would be a limitation on 
what we could do with computers. It might appear that this question would depend 
upon the type of computer (its memory, speed, etc.) and the choice of programming 
language. Somewhat surprisingly, it turns out that the answer to this question is the 
same for all computers and all computer languages: It is not possible to program any 
computer to do all possible things we might wish it to do. The key is the cardinality 
of sets. 

As seen in the above example of a computer program, any computer program is a 
finite string of symbols, constructed out of an allowed list of symbols. In Haskell, for 
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example, the allowed symbols include the letters of the English alphabet (uppercase 
and lowercase versions of the same letter are considered to be different symbols), the 
digits 0,...,9, various symbols such as =, :, [, | and so on, and a blank space (which 
we also think of as a symbol). Repeated blank spaces and new lines are ignored by 
the computer (though they make it easier for human beings to read the code), so 
we can ignore them too. For a given computer programming language, let Y denote 
the set of all possible symbols used, including the blank space symbol. The set & is 
always finite. Using the symbols in 2, we can then write computer programs, which 
are simply certain finite strings of symbols in 2, though of course not all strings will 
be valid programs. Let $(Z) denote the set of all finite strings with symbols in LY, and 
let C(Z) denote the set of all valid programs using symbols in Y. Then C(Z) C S(Z). 

As stated above, a computer program causes the computer to take various input 
data, and for each possible input, produce some output data. For a computer pro- 
gram written with the symbols 2, both the input and the output are finite strings 
of symbols in 2. Therefore each computer program in Y causes the computer to 
act as a function S(Z) — S(Z). The collection of all such functions is denoted 
F (S(Z), S(Z)), using the notation of Section 4.5. Putting these observations together, 
we see that each programming language using symbols in 2 gives rise to a function 
®: C(L) > F(S(Z),S(Z)), where for each computer program p written with sym- 
bols in Y, we obtain the function ®(p): S(Z) > S(Z). 

Our question stated above asking whether there is a computer programming lan- 
guage with which we could make the computer do anything we might wish it to do 
can now be expressed by asking whether there is some programming language such 
that the corresponding function @ is surjective. If @ were surjective, then every pos- 
sible function S(2) — S(Z) could be obtained from at least one computer program. 
On the other hand, if were not surjective, then there would be at least one function 
S(Z) — S(Z) that we might want the programming language to do that could not be 
achieved. 

The answer to our question is that regardless of the programming language and 
the set of symbols 2 used, and regardless of the computer used, the function @ is 
never surjective. The reason is that C(Z) is always countable, and F(S(Z),S(Z)) 
is always uncountable. The fact that there cannot be a surjective function from a 
countable set to an uncountable one follows from Theorem 6.6.8; the details are left 
to the reader. 

To see that C(Z) is countable, we will show that S(Z) is countable, and then use 
Theorem 6.6.7. The set S(X) is the collection of finite strings of elements of Y. For 
eachn EN, let S,(2) denote the set of strings of length n. Hence S(2) =U, Sn (2). 
It can be seen that S,,(Z) is a finite set for each n € N; this fact is intuitively clear, and 
can be seen rigorously using the ideas of Section 7.7. It follows that each set S,,(Z) 
is countable, and hence S(Z) = U7_, S,(2) is countable by Theorem 6.6.9 (2). 

To see that ¥(S(X),S(Z)) is uncountable, we start by observing that because 
S() is countable, and is clearly infinite, it must be countably infinite. Hence S(Z) ~ 
N. By Lemma 4.5.3 we see that ¥(S(Z),S(Z)) ~ F (N,N). Exercise 6.7.7 says that 
¥ (N,N) is uncountable, and hence so is ¥ (S(Z),S(Z)). 
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We therefore see that cardinality considerations imply that there is a theoretical 
limitation to what can be accomplished by computer programming. See [Har96] for 
further discussion. © 


Exercises 


Exercise 6.7.1. Which of the following sets is countable, and which has the same 
cardinality as IR? Informal justification is acceptable. 


(1) {/2|n€N}. 

(2) {q¢ € Q| gq has denominator a multiple of 3 when gq is expressed in lowest 
terms}. 

(3) QN{2,3). 

(4) [3,4] U [5,6]. 

(5) GL3(Z), which is the set of invertible 3 x 3 matrices with integer entries. 

(6) [0,1] x [0,1]. 

(7) {9* |x € R}. 

(8) {SCN |S has 7 elements}. 

(9) The set with elements that are the closed bounded intervals in R having ra- 
tional endpoints. 

[Use Exercise 6.5.6.] 


Exercise 6.7.2. Prove that the set 
S = {x € (0,1) | the decimal expansion of x has only odd digits} 


is uncountable. 
Exercise 6.7.3. [Used in Theorem 6.7.2.] 


(1) Let n EN. Let A, be the set of all roots of polynomials of degree n with 
rational coefficients. Prove that A, is countable. You may assume the fact 
that a polynomial of degree n has at most n roots. 

(2) Prove that the set of algebraic numbers is countably infinite. 


Exercise 6.7.4. [Used in Theorem 6.7.5.] 


(1) Prove that (0,1) x (0,1) ~ (0,1). Use the fact that every real number can be 
expressed uniquely as an infinite decimal, if decimal expansions that eventu- 
ally become the number 9 repeating are not allowed. 

(2) Let A and B be sets. Suppose that A ~ R and B ~ R. Prove thatA x B~R. 


Exercise 6.7.5. [Used in Section 6.7.] This exercise is for the reader who is familiar 
with the complex numbers. Prove that the set of complex numbers C has the same 
cardinality as R. 


Exercise 6.7.6. Let D be a partition of IR such that each element of D is an interval 
of some sort, other than an interval with only one element. Prove that D is countable. 
(Use the “density” of Q in R, as mentioned in the text.) 
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Exercise 6.7.7. [Used in Section 6.6 and Example 6.7.6.] Prove that ¥(N, {0,1}) and 
(N,N) are uncountable. [Use Exercise 6.6.7.] 


Exercise 6.7.8. [Used in Section 6.7.] The purpose of this exercise is to prove that 
?(N) ~R. As in the proof of Theorem 6.7.3, it will suffice to prove that P(N) ~ 
(0,1). We will use the facts about decimal expansions of real numbers that were 
used in the proof of Theorem 6.7.3, as well as the analogous facts about binary 
expansions, according to which every number in the interval (0,1) can be written 
uniquely in the form 0.5; b2b3 ..., where the numbers bj,b2,b3,... are in the set 
{0,1}, and where the expansion does not eventually become the number | repeating. 
See [Blol1, Section 2.8] for a detailed proof of these facts about decimal and binary 
expansions. 

By the Schroeder—Bernstein Theorem (Theorem 6.5.10), it will suffice to prove 
that P(N) = (0,1) and that (0,1) =< @(N). 


(1) Use the decimal expansion of numbers in (0, 1) to define an injective function 
g: P(N) > (0,1). 
(2) Use the binary expansion of numbers in (0,1) to define an injective function 


f: (0,1) > P(N). 


Exercise 6.7.9. [Used in Section 6.7.] In Theorem 6.7.1 we saw that the set Q is 
countably infinite, which told us that in principle the elements of Q could be “lined 
up” in some order just like the elements of N. Of course, it is nicer to see a concrete 
way of lining up the rational rational numbers, rather than just knowing that it is 
possible to do so in principle. In Figure 6.7.1 we saw a well-known way of lining 
up the positive rational numbers, due to Cantor. In that figure, the positive rational 
numbers were lined up by following the arrows, and dropping every fraction that 
is equal to one that had already been encountered. In this exercise we discuss an 
alternative way of lining up the positive rational numbers, having the aesthetic appeal 
of never encountering any number twice, and therefore avoiding the need to drop 
repeated numbers as in Cantor’s procedure. (This alternative method is discussed in 
[CWO00], where it is attributed to [Ste58].) 

The alternative way of lining up the positive rational numbers is represented by 
the following diagram. 


This diagram is constructed using Definition by Recursion, starting with 1, and 
then adding one row at a time, where the fractions in each row are obtained from 
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those in the previous row by taking every fraction ¢ in the previous row and writing 


rae and ath We will prove below that every positive rational number, expressed 


as a fraction in lowest terms, is obtained precisely once by this procedure. We can 
therefore line up the positive rational numbers by stringing together the successive 


rows in the diagram, yielding 


; > pete 


11213 2 3 
1 


14352534 
1°2°1°3°2’3'1' 4’ 3°52’ 5’ 3’ 471 
To prove that this procedure works, it is easier if we replace each fraction of the 
form § with the ordered pair (a,b). Observe that the fraction ¢ is in lowest terms if 
and only if the two numbers a and b are relatively prime, as defined in Exercise 2.4.3. 


As in Exercise 4.4.8, let L be the set defined by 
L= {(a,b) €NxN|aand Dare relatively prime}, 


and let U,D: L — L be defined by U((a,b)) = (a+b,b) and D((a,b)) = (a,a+b) 
for all (a,b) € L (these functions are well-defined by Exercise 2.4.3). 

We now define subsets A;,A2,A3,... C Las follows. For each n €N, the set A, 
will have 2”—! elements, labeled as 


_ 1 2 gn-1 
An ={O,;CaysiaCy t- 


We define these elements using Definition by Recursion as follows. Let Gi = (1,1). 
Now suppose that the set A,, has been defined for some n € N. Then define the ele- 
ments of Ay+1 by Ca = D(ck) and c7X , =U (ck) for allk € {1,...,2"-!}. It is seen 
that this definition captures the procedure given in the above diagram. 

To prove our desired result, we will show that U2, A; = L, and that there are no 
redundancies among the elements of the form c;. More precisely, for each n € N, let 


Sn =U, Ai. It will suffice to show that the following two claims hold for all 1 € N. 


(1) The set S,, contains all (a,b) € L such that 1 <a<nand 1 <b <n. (There 
may also be other elements of L in S,,, but that does not matter.) 

(2) All the elements in S, are distinct, which means that cf = cj if and only if 
i= j and x=y, for all i, j,x,y € N such thatO <i<n, and 0 < j <n, and 
1<x<2jandl<y<2/, 


Prove both these claims. Use Exercise 2.4.3 and Exercise 4.4.8. 


Part III 
EXTRAS 


Having completed the basics, we now turn to a number of additional top- 
ics. These topics were chosen because they contain accessible ideas from 
important areas of modern mathematics, and because they make use of 
the concepts we have learned so far. Due to space limitations, we will 
be a bit more terse in this part of the text than previously. Each section 
of Chapter 7 gives a very brief introduction to a particular topic; further 
details await courses in those areas. Section 7.1 treats binary operations, 
Sections 7.2 and 7.3 treat groups, Sections 7.4 and 7.5 treat partially or- 
dered sets and lattices, Sections 7.6 and 7.7 treat enumeration, and Sec- 
tion 7.8 treats limits of sequences. In Chapter 8 we let the reader take 
over. In each of Sections 8.2—8.7 we briefly introduce a topic, which the 
reader is then urged to explore on her own; in Section 8.8 the reader 
has the opportunity to assume the role of a mathematics professor and 
critique some attempted proofs taken from actual homework exercises 
submitted by students. 
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Selected Topics 


Don’t just read it; fight it! Ask your own questions, look for your own ex- 
amples, discover your own proofs. 
— Paul Halmos (1916-2006) 


7.1 Binary Operations 


Among the most basic topics taught in elementary school mathematics are opera- 
tions such as addition and multiplication of numbers. Each such operation takes two 
numbers, and produces a single resulting number. Another type of operation is nega- 
tion of numbers, which takes a single number and produces another number. We can 
formalize both these types of operations using sets and functions. 


Definition 7.1.1. Let A be a set. A binary operation on A is a function A x A > A. 
A unary operation on A is a function A — A. A 


Let A be a set, and let *: Ax A — A be a binary operation. If a,b € A, then it 
would be proper to denote the result of doing the operation * to the pair (a,b) by 
writing «((a,b)). Such notation is quite cumbersome, however, and would not look 
like familiar binary operations such as addition of numbers. Hence, we will write 
axb instead of «((a,b)). 

Binary operations are used throughout mathematics. We will prove only one very 
easy theorem about binary operations in this section, because it is hard to say much 
of interest about binary operations in general. Rather, we will look at various ex- 
amples, and define certain important properties that binary relations may satisfy. An 
important use of binary operations and the properties will be seen in Sections 7.2 and 
73. 


Example 7.1.2. 


(1) The sum of any two natural numbers is a natural number, and hence we can 
think of addition on N as a function +: N x N —N, which means that addition is a 
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binary operation on N. Subtraction is not a binary operation on N, because the dif- 
ference of two natural numbers is not always a natural number. However, subtraction 
is a binary operation on Z. 

(2) This example involves 2 x 2 matrices. See any introductory text on linear 
algebra, for example [ARO5, Chapters | and 2], for the relevant information about 
matrices. Let GL2(R) denote the set of invertible 2 x 2 matrices with real number 
entries. (The notion of an inverse matrix was discussed very briefly just before Theo- 
rem 2.5.2, and is discussed in detail in any introductory text on linear algebra.) Such 
matrices are precisely those with non-zero determinant. 

Let - denote matrix multiplication (again, see any introductory text on linear al- 
gebra for details). Then - is a binary operation on GL2(R), because the product of 
two matrices with non-zero determinant also has non-zero determinant. On the other 
hand, matrix addition is not a binary operation on GL2(R), because two matrices 
with non-zero determinant could add up to a matrix with zero determinant (the reader 
should supply an example). 

(3) Just as multiplication of numbers is often taught by using multiplication ta- 
bles, we can define binary operations on finite sets by using operation tables. For 
example, let Z = {p,q,r}. We define a binary operation « on Z by the operation table 


xipgr 


Prpq 
q\pqr 
rir p. 


To compute rx p, for example, we look in the row containing r and the column 
containing p, which yields rx p = r. It is important not to reverse the role of rows 
and columns when we use operation tables; for example, the entry in the column 
containing r and the row containing p is g. Any table using the elements of Z as 
entries would define a binary operation on Z. » 


There are a number of useful properties that a binary operation might or might 
not satisfy. The first of these properties generalizes the fact that x+y = y+ x for all 
x,yER. 


Definition 7.1.3. Let A be a set, and let * be a binary operation on A. The binary 
operation * satisfies the Commutative Law (an alternative expression is that * is 
commutative) if ax b=bxa forall a,b EA. A 


Example 7.1.4. 


(1) The binary operations addition and multiplication on Z are both commutative. 
The binary operation subtraction on Z is not commutative; for example, we see that 
5-242-5. 

(2) The binary operation - defined in Example 7.1.2 (2) is not commutative. The 
reader should supply an example of two matrices A,B € GL2(R) such that A-B 4 
B.-A. Some pairs of matrices in GL2(R) can be multiplied in either order without 
changing the result, for example G a) : ie o) = (3 3) : ie rae however, because it 
is not the case that all pairs can be multiplied in either order without changing the 
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result, the binary operation - is not commutative. (Even if the commutative property 
fails for only a single pair of elements, then the binary operation is not commutative.) 

(3) The binary operation « defined in Example 7.1.2 (3) is not commutative, be- 
cause pxr = q andrxp =r. This non-commutativity can be seen easily by observ- 
ing that the entries of the operation table for « are not symmetric with respect to the 
downward sloping diagonal. .) 


The next property of binary operations generalizes the fact that (x+y) +z = 
x+(y+z) forall x,y,z ER. 


Definition 7.1.5. Let A be a set, and let * be a binary operation on A. The binary 
operation * satisfies the Associative Law (an alternative expression is that * is asso- 
ciative) if (a*b) *c = ax (b*c) for alla,b,c € A. A 


Example 7.1.6. 


(1) The binary operations addition and multiplication on Z are both associative. 
The binary operation subtraction on Z is not associative; for example, we see that 
(5—2)-145-—(2-1). 

(2) The binary operation - defined in Example 7.1.2 (2) is associative. This fact 
can be proved directly by a tedious computation, or indirectly by using more ad- 
vanced facts from linear algebra; see [AROS, Chapter 1] for details. 

(3) The binary operation x defined in Example 7.1.2 (3) is not associative, be- 
cause (r* p) *p =rx* p =r, whereas rx (px p) =rxr = p. In contrast to commuta- 
tivity, which can be verified quite easily for a binary operation given by an operation 
table via symmetry with respect to the downward sloping diagonal, there is no corre- 
spondingly simple way to verify associativity. The direct way to verify associativity 
for a binary operation given by an operation table is simply to check all the possible 
ways of combining three elements at a time; such verification is extremely tedious 
when the set has more than a few elements. © 


As we discussed at the start of Section 3.4, it is the associativity of addition on R 
that allows us to write expressions such as 3 + 8 +5 without fear of ambiguity, given 
that the binary operation + applies to only two elements at a time. If you asked two 
people to calculate the sum 3 + 8 + 5 in their heads, one person might first add 3 and 
8, obtaining 11, and then add 5 to that, obtaining 16, and the other person might first 
add 8 and 5, obtaining 13, and then add 3 to that, obtaining 16. In other words, one 
person might do (3 +8) +5, whereas the other might do 3 + (8 +5). Of course, the 
same result would be obtained by either method, precisely because addition on R 
is associative. The same idea holds for any other associative binary operation. That 
is, if we are given an associative binary operation * on a set G, and three elements 
a,b,c € G, we can write a*b*c unambiguously, because it could be calculated as 
either (a*b) *c or ax (b*c), and the same result would be obtained by either method. 
This idea can be extended by recursion to allow us to combine any finite number of 
elements of G unambiguously using *, though we cannot use this method to combine 
infinitely many elements at once. 

The next property of binary operations generalizes the unique role of the number 
0 in relation to addition of numbers, which is that x +0 =x =0+ x for allx € R. 
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Definition 7.1.7. Let A be a set, and let * be a binary operation on A. An element 
e €A is an identity element for « if axe =a =e «a for alla € A. If has an identity 
element, the binary operation * satisfies the Identity Law. A 


Observe that we need to specify both axe =a and exa=a for alla € Ain 
Definition 7.1.7, because it cannot be assumed that « is commutative, and so knowing 
only one of these equalities does not necessarily imply the other. 


Example 7.1.8. 


(1) The binary operation multiplication on N has an identity element, the number 
1, because n- 1 =n =1-n for all n € N. The binary operation addition on N does 
not have an identity element, because 0 ¢ N. On the other hand, addition on Z does 
have an identity element, the number 0. The binary operation subtraction on Z does 
not have an identity element. Even though n — 0 = n for all n € N, we observe that 
O—n#nwhennF 0. 

(2) The binary operation - defined in Example 7.1.2 (2) has an identity element, 
namely, the identity matrix (} 9), as the reader can verify. 

(3) The binary operation x defined in Example 7.1.2 (3) has an identity element, 
which is qg. This fact can be verified directly by checking all possibilities, for example 
Px*q =p and qx p =p, and so on. The fact that g is an identity element can be seen 
easily by observing in the operation table for « that the column below q is identical 
to the column below x, and the row to the right of g is identical to the row to the right 
of x. 

(4) Let T = {k,m,n}, and let © be the binary operation on T defined by the oper- 


ation table 
olk mn 


k\k mm 
mimnk 


nin km 


We see that © has no identity element. It is true that mok =m and kom=™m, that 
kok =k (only one equality is needed here) and that nok =n, but we observe that 
kon #n, and that last fact is sufficient to rule out k as an identity element. It is easily 
seen that no element other than k could be the identity element with respect to o; we 
omit the details. © 


Observe that in Definition 7.1.7, it is not stated that an identity element, if it 
exists, is unique. The following lemma shows, however, that uniqueness holds auto- 
matically. 


Lemma 7.1.9. Let A be a set, and let *« be a binary operation on A. If * has an 
identity element, the identity element is unique. 


Proof. Let e,é € A. Suppose that e and é are both identity elements for «. Then 
e=exé= é, where in the first equality we are thinking of é as an identity element, 
and in the second equality we are thinking of e as an identity element. Therefore the 
identity element is unique. 
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Because of Lemma 7.1.9, if a binary operation has an identity element, we can 
refer to it as “the identity element.” 

The last property of binary operations we discuss generalizes the idea of the 
negation of a real number. The relevant property of negation is that it allows us to 
“cancel out” the original number. More precisely, we know that x + (—x) = 0 and 
(—x) +x = 0 for all x € R. In general, canceling out means obtaining the identity 
element for the binary operation under consideration. Of course, it is only possible 
to define this property for a binary operation that has an identity element. 


Definition 7.1.10. Let A be a set, and let « be a binary operation of A. Let e € A. 
Suppose that e is an identity element for «. If a € A, an inverse for a is an element 
a’ € A such that a*a' = e and a’ xa =e. If every element in A has an inverse, the 
binary operation * satisfies the Inverses Law. A 


As in the definition of identity elements, we need to specify both a* a’ = e and 
a’ *a=e for alla € A, because we cannot assume that « is commutative. 


Example 7.1.11. 


(1) Every element of Z has an inverse with respect to addition, namely, its nega- 
tive. On the other hand, not every element of Z has an inverse with respect to multi- 
plication, because the reciprocal of most integers is not an integer. Every element of 
Q-— {0} has an inverse with respect to multiplication, namely, its reciprocal. 

(2) Let GL2(R) and - be as in Example 7.1.2 (2). Every element of GL2(R) has 
an inverse, because GL2(R) is the set of invertible 2 x 2 matrices with real entries; 
recall that a 2 x 2 matrix A is invertible precisely if there is a 2 x 2 matrix B such that 
A-B= (49) and B-A = (5°), 

(3) Let H = {a,b,c,d,e}, and let « be the binary operation on H defined by the 
operation table 
xleabcd 
eleabcd 
ajabede 
blbecea 
cilcedeac 
djdbacb 


It is seen that e is the identity element. We see that ax b = e = ba, so b is an inverse 
of a, and a is an inverse of b. We observe that c« b = e = bxc, so b is an inverse 
of c, and c is an inverse of b. Therefore b has more than one inverse. We see that 
exe =e, So e is its own inverse. Finally, we see that d has no inverse, because there 
is no x € A such that d «x = e (it is the case that a«d = e, but for a to be the inverse 
of d it would also have to be the case that d « a = e, and that is not true). © 


Exercises 


Exercise 7.1.1. Which of the following formulas defines a binary operation on the 
given set? 
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(1) Let * be defined by x * y = xy for all x,y € {—1,—2,—3,...}. 

(2) Let o be defined by xo y = ,/xy for all x,y € [2,°). 

(3) Let @ be defined by x®y =x—y for all x,y EQ. 

(4) Let 0 be defined by (x,y) 0(z,w) = (x+z,y+w) for all (x,y), (z,w) € R? — 
{(0,0)}- 

(5) Let © be defined by x © y = |x +y| for all x,y EN. 

(6) Let @ be defined by x ® y = In(|xy| — e) for all x,y EN. 


Exercise 7.1.2. For each of the following binary operations, state whether the bi- 
nary operation is associative, whether it is commutative, whether there is an identity 
element and, if there is an identity element, which elements have inverses. 


(1) The binary operation © on Z defined by x @ y = —xy for all x,y € Z. 

(2) The binary operation x on R defined by xxy =x + 2y for all x,y € R. 

(3) The binary operation © on R defined by x®y =x+y—7 for all x,y € R. 

(4) The binary operation « on Q defined by x* y = 3(x+ y) for allx,y € Q. 

(5) The binary operation o on R defined by xoy =x for all x,y ER. 

(6) The binary operation © on Q defined by xoy =x+y-+xy for all x,y EQ. 

(7) The binary operation © on R? defined by (x,y) © (z,w) = (4xz,y+w) for all 
(x,y), (z,w) € R?. 


Exercise 7.1.3. For each of the following binary operations given by operation ta- 
bles, state whether the binary operation is commutative, whether there is an identity 
element and, if there is an identity element, which elements have inverses. (Do not 
check for associativity.) 


Ql 23 xljabcde 
ay aldeabb 
2/232 ed aa 
3))1.2.3:s clabcde 
: djlbbadec 
kl 
Sat elbdeca. 
(2) k\j k lm olirsabc 
kl jl ilirsabec 
mijmlm. rirsicab 
alx y zw (5) slsirbca 
xlx zwy alabcisr 
(3) ylzewy x blbcaris 
zlwy xz cleabsri. 
wly x zw. 


Exercise 7.1.4. [Used in Section 7.2.] Find an example of a set and a binary opera- 
tion on the set such that the binary operation satisfies the Identity Law and Inverses 
Law, but not the Associative Law, and for which at least one element of the set has 
more than one inverse. The simplest way to solve this problem is by constructing an 
appropriate operation table. 
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Exercise 7.1.5. Let n € N. Recall the definition of the set Z,, and the binary operation 
- on Z, given in Section 5.2. Observe that [1] is the identity element for Z,, with 
respect to multiplication. Let a € Z. Prove that the following are equivalent. 


a. The element [a] € Z,, has an inverse with respect to multiplication. 
b. The equation ax = 1 (mod n) has a solution. 
c. There exist p,g € Z such that ap+-nq=1. 


(It turns out that the three conditions listed above are equivalent to the fact that a 
and n are relatively prime, as defined in Exercise 2.4.3; a proof of that fact uses 
Theorem 8.2.6, though the reader need not be concerned with that proof.) 


Exercise 7.1.6. Let A be a set. A ternary operation on A is a function A x A x 
A —A. A ternary operation x: A x A x A — A is left-induced by a binary operation 
©: AXA —A if *((a,b,c)) = (aob) oc for all a,b,c EA. 

Is every ternary operation on a set left-induced by a binary operation? Give a 
proof or a counterexample. 


7.2 Groups 


As discussed in Section 7.1, some binary operations satisfy various nice properties, 
such as associativity and commutativity, whereas others do not. Certain combinations 
of these properties have been found, in retrospect, to be particularly widespread and 
useful. The most important such combination of properties is given in the following 
definition. 


Definition 7.2.1. Let G be a non-empty set, and let * be a binary operation on G. 
The pair (G, «) is a group if * satisfies the Associative Law, the Identity Law and the 
Inverses Law. A 


Logically, it would have been possible to drop the non-emptiness requirement in 
Definition 7.2.1, because the empty set satisfies all three properties (even the Identity 
Law, because the identity element is only needed for use with existing elements, of 
which the empty set has none). However, the empty set is quite uninteresting as a 
group, and so to avoid special cases, we will assume that all groups have at least one 
element. 

Observe that Definition 7.2.1 does not require the Commutative Law. Though 
associativity may appear at first to be more obscure than commutativity, there turn 
out to be a number of important examples of binary operations where the former 
holds but the latter does not. One such example will be seen in Example 7.2.10 below. 
It is in order to include such examples that the definition of groups does not include 
the Commutative Law. On the other hand, groups that do satisfy the commutative 
property are particularly nice to work with, and merit a special name. 


Definition 7.2.2. Let (G,*) be a group. We say that (G,«) is an abelian group if « 
satisfies the Commutative Law. A 
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Among many other uses, Definition 7.2.2 gives rise to the following well-known 
mathematical joke. Question: What is purple and commutative? Answer: An abelian 
grape. (Mathematical jokes are rather scarce, so one cannot be overly picky about 
their quality.) 

Groups are relatively recent by mathematical standards, having arisen in the nine- 
teenth century, but they are now important in a wide variety of areas of both pure and 
applied mathematics, including geometry, algebraic topology, quantum mechanics 
and crystallography, to name just a few. Crystallography makes use of the centrality 
of groups in the rigorous study of symmetry. See [Fra03] or [Rot73], among many 
possible texts, for a more detailed treatment of group theory; see [Arm88] or [Bur85] 
for the connection between group theory and symmetry; and see [LP98, Chapter 6] 
for some applications of group theory. Our brief discussion of groups, in this section 
and the next, cannot even begin to hint at the many fascinating aspects of this topic. 

The term “group” is one of those words that has a standard colloquial meaning, 
and to which mathematicians have given a technical meaning that has little to do with 
the colloquial usage. The term “abelian” is in honor of the mathematician Niels Abel 
(1802-1829), who did important work in algebra. 

Formally, a group is a pair (G,*). However, when the binary operation * is under- 
stood from the context, or it is not important to designate the symbol for the binary 
operation, we will simply say “Let G be a group.” If we are discussing more than 
one group, we will write things such as “eg” if we need to specify to which group an 
identity element belongs. 


Example 7.2.3. 


(1) The pair (Z,+) is an abelian group, which is seen by combining Exam- 
ple 7.1.6 (1), Example 7.1.8 (1), Example 7.1.11 (1) and Example 7.1.4 (1). Sim- 
ilarly, it is seen that (Q,+), and (Q— {0},-) and (R,+), and (R— {0}, -) are abelian 
groups. 

(2) Let {e} be a single element set, and let « be the only possible binary op- 
eration on {e}, which is exe =e. It is simple to verify that ({e},*) is an abelian 
group. This group is called the trivial group. Any two trivial groups, while perhaps 
labeled differently, are essentially identical, as will be discussed more precisely in 
Example 7.3.9 (2). 

(3) Let GL2(R) and - be as in Example 7.1.2 (2). Then (GL2(R),-) is a group, 
but not an abelian group, which is seen by combining Example 7.1.6 (2), Exam- 
ple 7.1.8 (2), Example 7.1.11 (2) and Example 7.1.4 (2). 

(4) Let V = {e,a,b,c}, and let o be a binary operation on V be defined by 


oleabec 


eleabc 
alabce 
blbcea 
cleeab. 


The pair (V,o) is an abelian group. To verify that the Associative Law holds is te- 
dious, and simply requires checking all possible sets of three elements in V; we will 
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omit the details, though the ambitious reader is invited to check all 64 cases. It is easy 
to verify that e is the identity element, by observing in the operation table for o that 
the column below e is identical to the column below 0, and similarly for the row to 
the right of e. The elements e and b are their own inverses, and a and c are inverses of 
each other. Hence (V, 0) is a group. It is easy to verify that o is commutative, because 
the operation table is symmetric along the downward sloping diagonal. Hence (V,) 
is an abelian group. 

(5) The pair (Z,x) given in Example 7.1.2 (3) is not a group. Although « does 
have an identity element, which is gq, this binary operation is not associative, as dis- 
cussed in Example 7.1.6 (3), and the element p does not have an inverse. © 


The axioms for a group turn out to be surprisingly powerful, as can be seen 
from a full treatment of group theory (for which, of course, we do not have room in 
this book). We will discuss here only a few of the properties of groups that follow 
relatively straightforwardly from the axioms. We start with the observation that in 
the definition of a group, it is not stated that the identity element is unique, nor that 
each element has a unique inverse. However, we saw in Lemma 7.1.9 that if any 
binary operation has an identity element, then the identity element is unique, and 
so in particular that holds for groups. The following lemma shows that the inverse 
elements in groups are unique, though we remark that the uniqueness does not follow 
solely from the definition of inverses, but also requires the Associative Law, as seen 
by Exercise 7.1.4. 


Lemma 7.2.4. Let (G,*) be a group. If g € G, then g has a unique inverse. 


Proof. Left to the reader in Exercise 7.2.5. 


Because of Lemma 7.2.4, we can now refer to “the inverse” of a given element 
of the group. Another way of viewing this lemma is that if (G,*) is a group, and if 
a,b € Gare such that ax b = e and bxa =e, then b =a’. We will use this idea in the 
proof of Theorem 7.2.5 (4). 

The following theorem generalizes some familiar properties of (R,+-), for exam- 
ple that —(x+y) = (—x) + (—y) for all x,y € R. 

Theorem 7.2.5. Let (G,«) be a group, and let a,b,c € G. 
1. Ifaxc=bxc, thna=b_ (Cancellation Law). 
2. Ifc*a=cxb, thena=b_ (Cancellation Law). 
3. (a)! =a. 
4. (axb)'=b' xd. 


Proof. We prove Part (4), leaving the rest to the reader in Exercise 7.2.6. 


(4). By Lemma 7.2.4 we know that a* b has a unique inverse. If we can show 
that (a* b) « (b! xa’) =e and (b! xa’) «(axb) =e, then it will follow that a’ «b’ is the 
unique inverse for a « b, which means that (a *b)! = b’ xa’. Using the definition of a 
group we see that 


(axb)*(b' xa’) = [(a*xb) *b'] *a’ = [ax (bxb')| xd’ 
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=(avelea =ava' =e. 


A similar computation shows that (b/ «a’) «(a*b) =e. 


Observe that Theorem 7.2.5 (4) is the generalization to arbitrary groups of the 
fact that —(x+y) = (—x) + (—y) for all x,y € R. Observe, however, that in the state- 
ment of Theorem 7.2.5 (4) the order of a and b are reversed in the right-hand side of 
the equation (a *b)! = b! xa’. For an arbitrary group (G,*«), which does not neces- 
sarily satisfy the Commutative Law, it is not always true that (a *« b)' = a’ xb’ for all 
a,b € G. The reader is asked to provide such an example in Exercise 7.2.7. However, 
in the special case where (G,*) is (R,-+), and where a’ is given by —a for alla € R, 
then Theorem 7.2.5 (4) does indeed say exactly that —(x+ y) = (—x) + (—y) for all 
x,y € R. The fact that arbitrary groups do not necessarily satisfy the Commutative 
Law is also the reason why both Parts (1) and (2) of Theorem 7.2.5 are needed. 

A useful consequence of Theorem 7.2.5 (1) (2) is that if the binary operation of a 
group with finitely many elements is given by an operation table, then each element 
of the group appears once and only once in each row of the operation table and 
once and only once in each column (consider what would happen otherwise). We 
can therefore see instantly that (T,o) in Example 7.1.8 (4) is not a group, even if we 
had not known from our discussion in that example that there is no identity element, 
because the leftmost column does not have the element n. On the other hand, just 
because an operation table does have each element once and only once in each row 
and once and only once in each column does not guarantee that the operation yields 
a group; the reader is asked in Exercise 7.2.3 to find such an operation table. 

We now turn to the notion of a group inside another group. For example, the set 
Z sits inside the set Q, and both (Z,+) and (Q,+) are groups. We formalize this 
notion as follows. 


Definition 7.2.6. Let (G,*) be a group, and let H C G be a subset. The subset H is 
a subgroup of G if the following two conditions hold. 


(a) Ifa,b CH, thenaxbe dH. 
(b) (H,*) is a group. A 


Observe in Definition 7.2.6 that because (H, *) is itself a group, it must be non- 
empty, because all groups are assumed to be non-empty. Part (a) of the definition says 
that * is a binary operation on H. The following theorem allows for easy verification 
that a subset of a group is in fact a subgroup. The crucial issue is the notion of 
“closure” under the binary operation « and the unary operation ’. 


Theorem 7.2.7. Let (G,*) be a group, and let H C G be a non-empty subset. Then 
His a subgroup of G if and only if the following two conditions hold. 


(i) Ifa,b EH, thenaxb € H. 
(ii) Ifa eH, thend EH. 


Proof. First suppose that H is a subgroup. Then Property (i) holds by the definition 
of a subgroup. Let e be the identity element of G. Because (H,*) is a group, it has 
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an identity element, say é. (We cannot assume, until we prove it, that é is the same as 
e.) Then é* é = é thinking of é as being in H, and ex é = é thinking of é as being in 
G. Hence éx é = e x é. Because both é and e are in G, we can use Theorem 7.2.5 (1) 
to deduce that é =e. 

Now let a € H. Because (G, *) is a group, the element a has an inverse a’ € G. We 
will show that a’ € H. Because (H, *) is a group, then a has an inverse d@ € H. (Again, 
we cannot assume, until we prove it, that @ is the same as a’.) Using the definition 
of inverses, and what we saw in the previous paragraph, we know that a’ «a = e and 
dGxa =e. Hence G*a =a’ xa. Using Theorem 7.2.5 (1) again, we deduce that a’ = 4. 
Because 4 € H, it follows that a’ € H. Therefore Property (ii) of the theorem holds. 

Now suppose that Properties (1) and (ii) hold. To show that H is a subgroup, we 
need to show that (H,*) is a group. We know that « is associative with respect to 
all the elements of G, so it certainly is associative with respect to the elements of 
H. Let b € H. By Property (ii) we know that b’ € H. By Property (i) we deduce that 
b' xb © H, and hence e € H. Because e¢ is the identity element for all the elements of 
G, it is certainly the identity element for all the elements of H. By Property (11) we 
now know that every element of H has an inverse in H. Hence H is a group. 


The following corollary can be deduced immediately from the proof of Theo- 
rem 7.2.7. 


Corollary 7.2.8. Let G be a group, and let H C G be a subgroup. Then the identity 
element of G is in H, and it is the identity element of H. The inverse operation in H 
is the same as the inverse operation in G. 


Example 7.2.9. 


(1) The set Q is a subgroup of (R,+), and the set Z is a subgroup of each of 
(Q,+) and (R,+). 

(2) Let (G,*) be a group. Let e be the identity element of G. Then {e} and G are 
both subgroups of G. The subgroup {e} is often called the trivial subgroup of G. 

(3) Let (V,o) be as in Example 7.2.3 (4). By checking all possibilities, it is seen 
that the only subgroups of V are {e}, {e,b} and V. ?) 


We conclude this section with a very brief example of the relation between groups 
and symmetry. 


Example 7.2.10. We wish to list all possible symmetries of an equilateral triangle, 
as shown in Figure 7.2.1 (i). The letters A, B and C are not part of the triangle, but 
are added for our convenience. Mathematically, a symmetry of an object in the plane 
is an isometry of the plane (that is, a motion that does not change lengths between 
points) that take the object onto itself. In other words, a symmetry of an object in 
the plane is an isometry of the plane that leaves the appearance of the object un- 
changed. See [Rya86] for more about isometries. Because the letters A, B and C in 
Figure 7.2.1 (i) are not part of the triangle, a symmetry of the triangle may inter- 
change these letters; we use the letters to keep track of what the isometry did. There 
are only two types of isometries that will leave the triangle looking unchanged: re- 
flections (that is, flips) of the plane in certain lines, and rotations of the plane by 
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certain angles about the center of the triangle. In Figure 7.2.1 (ii) we see the three 
possible lines in which the plane can be reflected without changing the appearance 
of the triangle. Let M,, Mz and M3 denote the reflections of the planes in these lines. 
For example, if we apply reflection M2 to the plane, we see that the vertex labeled 
C is unmoved, and that the vertices labeled A and B are interchanged, as seen in 
Figure 7.2.1 (iii). The only two possible rotations of the plane about the center of 
the triangle that leave the appearance of the triangle unchanged are rotation by 120° 
clockwise and rotation by 240° clockwise, denoted R129 and R249. We do not need ro- 
tation by 120° counterclockwise and rotation by 240° counterclockwise, even though 
they also leave the appearance of the triangle unchanged, because they have the same 
net effect as rotation by 240° clockwise and rotation by 120° clockwise, respectively, 
and it is only the net effect of isometries that is relevant to the study of symmetry. 
Let J denote the isometry of the plane that does not move anything, that is, rotation 
by 0°. 


A B 
3 
Cc B i Cc A 
(i) (ii) (ii1) 
Fig. 7.2.1. 


The set G = {1,R120, R249,M1,M2,M3} is the collection of all isometries of the 
plane that take the equilateral triangle onto itself. Each of these isometries can be 
thought of as a function R* — R?, and as such we can combine these isometries by 
composition of functions. It can be proved that the composition of isometries is an 
isometry, and therefore composition becomes a binary operation on the set G; we 
omit the details. (Alternatively, it would be possible to use brute force to check all 36 
possible ways of forming compositions of pairs of these six isometries, and it would 
be seen that the composition of any two of these six isometries is also one of these 
Six isometries, again showing that composition is a binary operation on G.) We can 
then form the operation table 
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o | I Ri20 R249 My Mz M3 

I | I Ry Roo My M2 Mz 
R120|R120 Roan TF Mn M3 My 
Ro40|R240 TF Rizo M3 M, M2 
M,|M, M3 M2. I Roo R120 
Mz} M2 M, M3 Rix I R240 
M3|M3 Mz M, R40 R120 I 


Composition of functions is associative, as proved in Lemma 4.3.5 (1), and hence 
this binary operation on G is associative. Observe that J is an identity element. It is 
seen that J, M,, Mz and M3 are their own inverses, and that Rj29 and R249 are inverses 
of each other. Therefore (G,o) is a group. This group is not abelian, however. For 
example, we see that Rj29 0M, 4 M, o Rj29. The subgroups of G are {7, R129, R240}, 
{1,M;}, {I,M>} and {1,M3}. 

The group G is called the symmetry group of the equilateral triangle. .) 


Similarly to the equilateral triangle in Example 7.2.10, every object in Euclidean 
space has a corresponding symmetry group, though such groups are often much more 
complicated than the symmetry group of the equilateral triangle. Because groups 
have been widely studied by mathematicians, it turns out that quite a lot can be 
proved about symmetry groups. For example, group theory has been used to obtain 
rather surprising results about the symmetries of ornamental patterns such as frieze 
patterns and wallpaper patterns. See [Arm88] or [Bur85] for more about symmetry 
groups. 


Exercises 


Exercise 7.2.1. Which of the following sets and binary operations are groups? 
Which of the groups are abelian? 


(1) The set (0, 1], and the binary operation multiplication. 

(2) The set of positive rational numbers, and the binary operation multiplication. 
(3) The set of even integers, and the binary operation addition. 

(4) The set of even integers, and the binary operation multiplication. 

(5) The set Z, and the binary operation « on Z defined by axb = a—b for all 


a,beZ. 

(6) The set Z, and the binary operation x on Z defined by ax b = ab +a for all 
a,beZ. 

(7) The set Z, and the binary operation © on Z defined by aob=a+b-+1 for all 
a,b eZ. 


(8) The set R— {—1}, and the binary operation © on R— {—1} defined by a© 
b=a+b-+ab for all a,b Ee R— {-]}. 


Exercise 7.2.2. Let P = {a,b,c,d,e}. Find a binary operation *« on P given by an 
operation table such that (P,*) is a group. 
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Exercise 7.2.3. [Used in Section 7.2.] Find an example of a set and a binary operation 
on the set given by an operation table such that each element of the set appears once 
and only once in each row of the operation table and once and only once in each 
column, but the set together with this binary operation is not a group. 


Exercise 7.2.4. Let A be a set. Define the binary operation A on (A) by X AY = 
(X —Y)U(¥ —X) for all X,Y € P(A). (This binary operation is called symmetric 
difference; some properties of symmetric difference are proved in Exercise 3.3.14.) 
Prove that (#(A), A) is an abelian group. 


Exercise 7.2.5. [Used in Lemma 7.2.4.] Prove Lemma 7.2.4. 
Exercise 7.2.6. [Used in Theorem 7.2.5.] Prove Theorem 7.2.5 (1) (2) (3). 


Exercise 7.2.7. [Used in Section 7.2.] Find an example of a group (G,*), and ele- 
ments a,b € G, such that (ax b)! Za’ xb’. 


Exercise 7.2.8. Let A be a non-empty set, and let « be a binary operation on A. 
Suppose that * satisfies the Associative Law and the Identity Law, and that it also 
satisfies the Right Inverses Law, which states that for each a € A there is an element 
b €A such that a* b = e, where e is the identity element for «. 


(1) Prove that (A, «) satisfies Theorem 7.2.5 (1). 
(2) Prove that « satisfies the Inverses Law, and hence (A, *) is a group. 


Exercise 7.2.9. Let (G,*) be a group. Prove that the following are equivalent. 


a. Gis abelian. 

b. aba'b’ = e for alla,b € G. 

c. (ab)* = a’b? for all a,b € G. 
Exercise 7.2.10. Let (G,*) be a group. Suppose that K C H C G. Prove that if K is 
a subgroup of H, and H is a subgroup of G, then K is a subgroup of G. 


Exercise 7.2.11. Let GL2(R) and - be as in Example 7.1.2 (2). Let SL2(R) denote the 
set of all 2 x 2 matrices with real entries that have determinant 1. Prove that SL2(R) 
is a subgroup of GL2(R). (This exercise requires familiarity with basic properties of 
determinants. ) 


Exercise 7.2.12. Let n € N. Recall the definition of the set Z,, and the operations + 
and - on Z, given in Section 5.2. 


(1) [Used in Example 7.3.2 and Exercise 7.3.3.] Prove that (Z,,-+) is an abelian 


group. 
(2) Suppose that 7 is not a prime number. Then n = ab for some a,b € N such 
that 1 <<a <nand1<b <n. Prove that the set {[0], [a], [2a],...,[(b— 1)a]} 


is a subgroup of Zp. 
(3) Is (Zn — {[0]},-) a group for all n? If not, can you find any conditions on n 
that would guarantee that (Z, — {[0]},-) is a group? 


Exercise 7.2.13. Let (G,*«) be a group. Prove that if x’ = x for all x € G, then G is 
abelian. Is the converse to this statement true? 
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Exercise 7.2.14. Describe the symmetry group of a square, similarly to our descrip- 
tion of the symmetry group of an equilateral triangle in Example 7.2.10. Find all the 
subgroups of the symmetry group of the square. 


7.3 Homomorphisms and Isomorphisms 


What does it mean for two groups to be “the same”? Consider the group (V,o) in 
Example 7.2.3 (4). We then form a new group (W,©), where W = {/,F,G,H}, and 
where © is defined by the same operation table as for 0, with J replacing e, with F 
replacing a, with G replacing b and with H replacing c. Formally, the group (W,°) 
is not identical to the group (V,o), and yet we would certainly like to consider them 
essentially the same. This concept is formalized by the use of functions between 
groups. Such functions need to be bijective, and must “preserve the group operation.” 
This latter notion is meaningful even for non-bijective functions, and we start by 
making it precise in the following definition. 


Definition 7.3.1. Let (G,«) and (H,©) be groups, and let f: G — H be a function. 
The function f is ahomomorphism (sometimes called a group homomorphism) if 
f(axb) = f(a)o f(b) for all a,b € G. 


Example 7.3.2. 


(1) We consider two examples of functions from (Z, +) to (Q,+). Let f: Z+Q 
be defined by f(n) = § for all n € Z. If n,m € Z, then f(n+m) in 3+5 
f(n) + f(m). Hence f is a homomorphism. Let g: Z — Q be defined by g(n) = 
n’ for all n € Z. If n,m € Z, then g(n+m) = (n+m)* = n* +2nm+m’, whereas 
g(n) +g(m) =n? +m’, and so g(n+m) is not always equal to g(n) + g(m), which 
means that g is not a homomorphism. 

(2) It is straightforward to verify that ((0,°°),-) is a group. Let h: R — (0,00) 
be defined by h(x) = e* for all x € R. Then / is a homomorphism from (R,+) to 
((0,0¢),-), because h(x+ y) =e"? = e*-e? =h(x)-h(y) for all x,y € R. 

(3) Let (V,o) be as in Example 7.2.3 (4). Let k: V — V be defined by k(e) = 
e, and k(a) = b, and k(b) = e, and k(c) = b. Then k is a homomorphism. Rather 
than verifying that k(xo y) = k(x) ok(y) for all x,y € V by checking all possibilities 
directly, we consider the following four cases. First, suppose that x,y € {e,b}. Then 
xoy € {e,b}, and hence k(x) = e, and k(y) = e, and k(xoy) =e. It follows that 
k(xoy) =e =e0e =k(x) ok(y). Second, suppose that x € {e,b} and y € {a,c}. 
Then xoy € {a,c}, and hence k(x) = e, and k(y) = b, and k(xoy) = b. It follows 
that k(xoy) = b =e0b=k(x) ok(y). The other two cases, which are x € {a,c} and 
y € {e,b}, or x,y € {a,c}, are similar, and we leave the details to the reader. 

(4) Let n € N. Recall the definition of the set Z, and the operations + and 
- on Z, given in Section 5.2. We know from Exercise 7.2.12 (1) that (Z,,+) is 
an abelian group. Recall the definition of the canonical map y: Z — Z, given in 
Definition 5.2.13. We know from Lemma 5.2.15 that y(a+ b) = y(a)+ y(b) and 
y(ab) = y(a)- y(b) for all a,b € Z. The first of these two properties, involving ad- 
dition, states that y is a homomorphism from (Z,+) to (Zn,+); more precisely, the 
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first property states that y is group homomorphism. It turns out that Z and Z,, both 
have the structure of a “ring,” which involves two binary operations (in this case 
addition and multiplication) that satisfy certain properties. The two properties of ¥, 
which involve both addition and multiplication, together state that y is in fact a “ring 
homomorphism.” We will not discuss rings in this text; the reader can find this topic 
in any introductory abstract algebra text, for example [Fra03]. 0) 


Homomorphisms of groups preserve the basic group structure, that is, the group 
operation. The following theorem shows that a group homomorphism also preserves 
some of the other features of groups. 


Theorem 7.3.3. Let G, H be groups, and let f: G— H be a homomorphism. Let eg 
and ey be the identity elements of G and H, respectively. 


1, f(ec) = eq. 
2. IfaéG, then f(a’) = | f(a)’, where the first inverse is in G, and the second 
is in H. 


3. IfA C Gis a subgroup of G, then f(A) is a subgroup of H. 
4. IfB CH is a subgroup of H, then f—'(B) is a subgroup of G. 


Proof. We will prove Parts (2) and (3), leaving the rest to the reader in Exercise 7.3.6. 
Let * and © be the binary operations of G and H, respectively. 


(2). Let a € G. Then f(a) f(a’) = f(ax*a') = f(eg) = ex, where the last 
equality uses Part (1) of this theorem, and the other two equalities use the fact 
that f is a homomorphism and that G is a group. A similar calculation shows that 
f(a’) f(a) = ex. By Lemma 7.2.4, it follows that [f(a)]' = f(a’). 


(3). By Corollary 7.2.8 we know that eg € A, and by Part (1) of this theorem 
we know that ey = f(eg) € f(A). Hence f(A) is non-empty. We can therefore use 
Theorem 7.2.7 to show that f(A) is a subgroup of H. Let x,y € f(A). Then there are 
a,b €A such that x= f(a) and y= f(b). Hence xoy = f(a)o f(b) = f(ax*b), because 
f is a homomorphism. Because A is a subgroup of G we know that a*b € A, and 
hence xoy € f(A). Using Part (2) of this theorem, we see that x’ = [f(a)]' = f(a’). 
Because A is a subgroup of G, it follows from Theorem 7.2.7 (ii) that a’ € A. We now 
use Theorem 7.2.7 to deduce that f(A) is a subgroup of H. 


The most important method of combining functions is by composition. The fol- 
lowing lemma shows that composition works nicely with respect to homomorphisms. 


Theorem 7.3.4. Let G, H and K be groups, and let f: G— H and j: H — K be 
homomorphisms. Then jo f is a homomorphism. 


Proof. Left to the reader in Exercise 7.3.7. 


Our next goal is to give a useful criterion by which it can be verified whether a 
given homomorphism is injective. We start with the following definition. 
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Definition 7.3.5. Let G and H be groups, and let f: G— H be a homomorphism. 
Let ey be the identity element of H. The kernel of f, denoted ker f, is the set ker f = 


f-'({en}): A 


Observe that if f: G— H is a homomorphism, then by Theorem 7.3.3 (4) we 
know that ker f is always a subgroup of G, because {e,} is a subgroup of H. 


Example 7.3.6. 


(1) Let g be as in Example 7.3.2 (2). The identity element of the group ((0,°°),-) 
is 1. Then ker g = g~!({1}) = {0}. Observe that the function g is injective. 

(2) Let k be as in Example 7.3.2 (3). Then kerk = k~!({e}) = {e, b}. This kernel 
is indeed a subgroup of V. We also compute that k~!({a}) = 0, that k-!({c}) =0 
and that k~'({b}) = {a,c}; none of these three inverse images are subgroups of V. 
Observe that the function k is not injective. 


In Example 7.3.6 (1) we had an injective function, and the kernel was the triv- 
ial subgroup; in Part (2) of the example we had a non-injective function, and the 
kernel was non-trivial. The following theorem shows that this correlation between 
injectivity of homomorphisms and triviality of kernels always holds. 


Theorem 7.3.7. Let G and H be groups, and let f: G— H be a homomorphism. Let 
eg be the identity element of G. The function f is injective if and only if ker f = {eg}. 


Proof. Suppose that f is injective. Because f(eg) = ey by Theorem 7.3.3 (1), it 
follows from the injectivity of f that ker f = f~!({ew}) = {ec}. 

Now suppose that ker f = {eg}. Let a,b € G, and suppose that f(a) = f(b). By 
Theorem 7.3.3 (2) and the definition of homomorphisms we see that 


f(bxa') = f(b) f(a’) = fla) o[f(a)J! =en: 


It follows that b«a’ € f~'({ey}) = kerf. Because ker f = {eg}, we deduce that 
bxa' = eg. A similar calculation shows that a’ «b = eg. By Lemma 7.2.4 we deduce 
that (a’)’ = b, and therefore by Theorem 7.2.5 (3) we see that b = a. Hence f is 
injective. 


Theorem 7.3.7 tells us that the kernel provides us an easy way to tell whether or 
not a homomorphism is injective. To tell whether an arbitrary function f: A — B of 
sets is injective, it would be both necessary and sufficient to verify that f—'({b}) is 
either the empty set or a single element set for all b € B. For homomorphisms, by 
contrast, it is necessary to check only one such set, namely, the kernel. 

We now define what we mean by saying that two groups are “essentially the 
same.” 


Definition 7.3.8. Let G and H be groups. 


1. Let f: G—H bea function. The function f is an isomorphism (sometimes 
called a group isomorphism) if it is a homomorphism and it is bijective. 
2. The groups G and H are isomorphic if there is an isomorphism G— H. A 
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If two groups are isomorphic, there may be more than one isomorphism between 
the groups, as we will see in Example 7.3.9 (1); to prove that two groups are isomor- 
phic, it is sufficient to find only one isomorphism between them. 


Example 7.3.9. 


(1) Let E denote the set of even integers. It is straightforward to verify that 
(E,+) is a group; we omit the details. We claim that (E,+) and (Z,+) are iso- 
morphic. Let f: Z — E be defined by f(n) = 2n for all n € Z. It is left to the 
reader to verify that f is bijective. To see that f is a homomorphism, observe that 
f(n+m) =2(n+m) =2n+2m = f(n)+ f(m) for all n,m € Z. Hence f is an iso- 
morphism, and therefore (E,+) and (Z, +) are isomorphic. The function f is not the 
only possible isomorphism Z — E. The reader can verify that the function g: Z — E 
defined by g(n) = —2n for all n € Z is also an isomorphism. 

(2) Any two trivial groups, as discussed in Example 7.2.3 (2), are isomorphic. 
Let ({e}, *) and ({w}, 0) be trivial groups. Let g: {e} — {u} be defined by g(e) =u. 
Then g(exe) = g(e) =u =uou= g(e)og(e). Hence g is ahomomorphism. Clearly 
g is bijective, and hence it is an isomorphism. 

(3) Because isomorphisms are bijective functions, we see that if two groups are 
isomorphic, then they have the same cardinality. Hence, two finite groups with dif- 
ferent numbers of elements cannot possibly be isomorphic. However, just because 
two finite groups have the same number of elements does not automatically guaran- 
tee that they are isomorphic. For example, let Q = {1,x,y,z}, and let © be the binary 
operation on Q defined by the operation table 


It can be verified that (Q,) is a group; we omit the details. The group (V,o) in 
Example 7.2.3 (4) also has four elements, but it is shown in Exercise 7.3.9 that (Q, ©) 
and (V,o) are not isomorphic. Intuitively, these groups are different in that all four 
elements of Q are their own inverses, whereas in V only two elements (the elements 
e and b) are their own inverses. © 


In Example 7.3.9 (3) we saw that there are at least two non-isomorphic groups 
with four elements. As seen in Exercise 7.3.10, it turns out that every group with four 
elements is isomorphic to one of these two groups. In general, it is quite difficult to 
take a natural number, and to describe all possible non-isomorphic groups with that 
number of elements, or even to say how many such groups there are; simply checking 
all possible operation tables (as is done in Exercise 7.3.10) is neither feasible nor 
satisfying with more than a few elements. The results are known for sufficiently small 
groups (up to 100 elements, for example), but there is no formula for the number of 
non-isomorphic groups with n elements for arbitrary n. See [Dea66, Section 9.3] or 
[Rot96, p. 85] for details. 
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We conclude this section with the following theorem, which gives some basic 
properties of isomorphisms. In the statement of Part (2) of the theorem, the function 
f_! is simply defined to be the inverse of the function f, because any bijective func- 
tion has an inverse by Theorem 4.4.5 (3); hence f~! is defined without any regard to 
the fact that f is a homomorphism. 


Theorem 7.3.10. Let G, H and K be groups, and let f: G— H and j: H — K be 
isomorphisms. 


1. The identity map 1g: G — G is an isomorphism. 
2. The function f—' is an isomorphism. 
3. The function jo f is an isomorphism. 


Proof. Left to the reader in Exercise 7.3.8. 


Exercises 


Exercise 7.3.1. Which of the following functions are homomorphisms? Which of 
the homomorphisms are isomorphisms? The groups under consideration are (R,+), 
and (Q, +), and ((0,c¢),-). 


(1) Let f: Q— (0,0) be defined by f(x) = 5* for allx€ Q. 

(2) Let k: (0,00) — (0,0) be defined by k(x) = x7’ for all x € (0,00). 
(3) Let m: R — R be defined by m(x) =x+3 for allx € R. 

(4) Let g: (0,0¢) — R be defined by g(x) = Inx for all x € (0,0). 

(5) Leth: R — R be defined by h(x) = |x| for all x € R. 


Exercise 7.3.2. Let (GL2(R),-) be the group described in Example 7.1.2 (2). We 
know from Example 7.2.3 (1) that (R— {0},-) is a group. Prove that the function 
det: GL2(R) — R— {0} is a homomorphism. What is the kernel of this function? 
(This exercise requires familiarity with basic properties of determinants.) 


Exercise 7.3.3. In this exercise we use the fact that (Z,,+-) is a group for all n EN, 
as was proved in Exercise 7.2.12 (1). 


(1) Let 7: Z4 — Zs be defined by j({x]) = [x] for all [x] € Za, where the two 
appearances of “[x]” in the definition of j refer to elements in different groups. 
Is this function well-defined? If it is well-defined, is it a homomorphism? If 
it is ahomomorphism, find the kernel. 

(2) Letk: Ze — Zs be defined by k([x]) = [x] for all [x] € Ze. Is this function well- 
defined? If it is well-defined, is it a homomorphism? If it is a homomorphism, 
find the kernel. 

(3) Can you find criteria on n,m € N that will determine when the function 
r: Zy — Zm defined by r([x]) = [x] for all [x] € Z, is well-defined and is a 
homomorphism? Prove your claim. Find the kernels for those functions that 
are well-defined and are homomorphisms. 
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Exercise 7.3.4. Let G and H be groups. Prove that the projection maps 7;: G x H — 
G and 7%): Gx H — H are homomorphisms (see Section 4.1 for the definition of 
projection maps). What is the kernel of each of these functions? 


Exercise 7.3.5. Prove that the two groups in each of the following pairs are isomor- 
phic to each other. 


(1) (Z,+) and (5Z,+), where 5Z = {5n |n € Z}. 

(2) (R— {0},-) and (R— {—1},*), where x* y =x+y+-xy for all x,y € R— 

{1}. 

(3) (R*,+) and (M>,.2(IR), +), where M>,(IR) is the set of all 2 x 2 matrices 
with real entries. 


Exercise 7.3.6. [Used in Theorem 7.3.3.] Prove Theorem 7.3.3 (1) (4). 
Exercise 7.3.7. [Used in Theorem 7.3.4.] Prove Theorem 7.3.4. 
Exercise 7.3.8. [Used in Theorem 7.3.10.] Prove Theorem 7.3.10. 


Exercise 7.3.9. [Used in Example 7.3.9.] Prove that the groups (V,o) in Exam- 
ple 7.2.3 (4) and (Q,©) in Example 7.3.9 (3) are not isomorphic. 


Exercise 7.3.10. [Used in Section 7.3.] Prove that up to isomorphism, the only two 
groups with four elements are (V,o) of Example 7.2.3 (4) and (Q,o) of Exam- 
ple 7.3.9 (3). Consider all possible operation tables for the binary operation of a 
group with four elements; use the fact that each element of a group appears once in 
each row and once in each column of the operation table for the binary operation of 
the group, as remarked after Theorem 7.2.5. 


7.4 Partially Ordered Sets 


In Sections 7.2 and 7.3 we discussed the concept of a group, which is an algebraic 
structure based on the notion of a binary operation. If we think of familiar number 
systems such as the natural numbers and real numbers, we observe that there is an- 
other type of structure on these sets, namely, the order relation <. In this section 
and the next we will discuss two important structures, called partially ordered sets 
and lattices, that are based on the notion of an order relation, rather than a binary 
operation. 

Order relations have widespread use in many areas of both pure and applied 
mathematics, such as combinatorics, boolean algebras, switching circuits, computer 
science and others. An interesting application of order relations to the theory of vot- 
ing is in [KR83a, Section 1.6], where a proof is given of the remarkable Arrow 
Impossibility Theorem (which says roughly that in an election with three or more 
candidates, no voting system satisfying certain reasonable conditions can exist). Be- 
cause of the widespread appearance of order relations in many combinatorial topics, 
they are often treated in texts on combinatorics, for example [Bog90, Chapter 7]. A 
treatment of order relations in the context of computer science is [DSW94, Chap- 
ter 16]. 
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In Section 5.1 we discussed relations in general, and in particular three proper- 
ties that a relation might satisfy, which are reflexivity, symmetry and transitivity. We 
now turn our attention to a particular type of relation, which generalizes the order 
relation < on R. The relation < is reflexive and transitive, but certainly not symmet- 
ric. Indeed, this relation is about as non-symmetric as can be, given the well-known 
property that if x,y € R and both x < y and y < x hold, then x = y. In other words, if 
x #y, it cannot happen that both x < y and y < x. 

Now compare the relation < on R with the relation C on P(A). Observe that C is 
also reflexive and transitive, and is similarly non-symmetric, in that if X,Y C A and if 
X CY andY CX, then X =Y. Both these relations involve what would intuitively be 
called “order” on some set, and it is this notion of order that we wish to generalize. 
There is, however, one substantial difference between < and C. For any x,y € R, 
we know that either x < y or y < x. On the other hand, for two arbitrary subsets 
X,Y CA, it might not be the case that either of X C Y or Y CX holds; for example, 
let A = {1,2,3,4}, let X = {1,2} and let Y = {3,4}. Informally, for < every two 
elements are “comparable,” whereas for C they are not necessarily so. Given that we 
want the broadest possible notion of order, we will not be requiring comparability in 
our most general definition. These ideas are all made precise as follows. 


Definition 7.4.1. Let A be a non-empty set, and let = be a relation on A. 


1. The relation =< is antisymmetric if x < y and y < x imply that x = y, for all 
x,y EA. 

2. The relation = is a partial ordering (also called a partial order) if it is 
reflexive, transitive and antisymmetric. If =< is a partial ordering, the pair 
(A, =) is a partially ordered set, often abbreviated as poset. 

3. The relation = is a total ordering (also called a total order or linear order- 
ing) if it is a partial ordering, and if for every a,b € A, at least one of a = b 
or b x a holds. If = is a total ordering, the pair (A, =) is a totally ordered 
set. A 


Formally, a poset is a pair (A,<). However, when the relation =< is understood 
from the context, or it is not important to designate the symbol for the relation, we 
will simply say “let A be a poset.” Similarly for totally ordered sets. We will primarily 
be looking at posets, rather than totally ordered sets, because the former are more 
prevalent orderings. Observe that posets and totally ordered sets are all assumed to 
be non-empty. 


Example 7.4.2. 


(1) The relation EF in Example 5.1.6 (5) is antisymmetric and reflexive, but it is 
not transitive, and hence it is not a partial ordering. 

(2) There are many relations that are reflexive and transitive but not antisymmet- 
ric. For instance, any equivalence relation that has non-equal elements that are related 
cannot be antisymmetric; the reader is asked to prove this fact in Exercise 7.4.5 (2). 
For example, the relation of congruence modulo n for any n € N such that n 4 | is 
reflexive and transitive, but not antisymmetric (see Section 5.2 for the definition of 
this relation). 
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(3) Each of the sets N, Z, Q and R with the relation < is a totally ordered set. 
The relation < on these sets is not a partial ordering, because it is not reflexive. 

(4) Let A be a set. Then (P(A), C) is a poset but not a totally ordered set, as 
mentioned previously. 

(5) The relation “a|b” on N is given in Definition 2.2.1. (The proper name for 
this relation is “|,” without the “variables,” but that would be awkward to read.) The 
relation is certainly reflexive, and it was shown in Theorem 2.2.2 that this relation 
is transitive. Let a,b € N, and suppose that a|b and b\a. Then by Theorem 2.4.3 we 
know that a = b or a= —b. Because both a and b are positive, then it must be the case 
that a = b. Hence the relation is antisymmetric, and therefore it is a partial ordering. 
This relation is not a total ordering, however; for example, neither 2|3 nor 3|2 holds. 
Observe that the relation a|b on Z is not antisymmetric, because 3|(—3) and (—3)|3, 
and yet 3 4 —3. 

(6) Let W be the set of all words in the English language. Let = be the relation 
on W defined as follows. If w; and w2 are words, then w; = w2 if for some n EN, the 
first n — 1 letters of w; and w2 are the same, and the n-th letter of w; comes before 
the n-th letter of wz in the usual ordering of the letters of the alphabet (the second 
condition is dropped when w; = w2). For example, we see that mandrel = mandrill. 
This relation, which is seen to be a total ordering, is called the lexicographical order 
(also called the dictionary order). 


A nice way to visualize finite posets is via Hasse diagrams. To construct these 
diagrams we need the following definition. 


Definition 7.4.3. Let (A,<) be a poset, and let a,b € A. The element b covers the 
element a if a < b, anda #5, and there is no x € A such thata<¥x=<banda4xF 
b. A 


We form the Hasse diagram of a finite poset as follows. First, put a dot on the 
page for each element of the poset, placed in such a way that if x = y then y is higher 
on the page than x, though not necessarily directly above it. Second, connect the dots 
representing elements x and y by a line segment if and only if y covers x. 


Example 7.4.4. 


(1) Let A = {2,4,6,8, 10,12}, and let = be the relation a|b discussed in Exam- 
ple 7.4.2 (5). By the argument given in that example, we know that (A, x) is a poset. 
The Hasse diagram for this poset is given in Figure 7.4.1 (i). Observe that there is 
no line segment from 2 to 8, even though 2 = 8, because 8 does not cover 2. Also, 
observe that the placement of the dots on the page is not unique. Figure 7.4.1 (ii) 
shows another possible Hasse diagram for the same poset. 

(2) For finite posets with small numbers of elements, Hasse diagrams allow us a 
convenient way to list all possible inequivalent posets of a given size; a rigorous def- 
inition of “inequivalent” needs the notion of order preserving functions defined later 
in this section, but we will use the term informally here. The Hasse diagrams of all 
inequivalent posets with 3 elements are given in Figure 7.4.2. (Of course, the Hasse 
diagrams are not the posets themselves, but they accurately represent the posets.) 
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8 12 8 12 
4 6 10 6 4 
2 2 
(i) (ii) 
Fig. 7.4.1. 
a 


(i) (ii) (ili) (iv) (v) 


Whenever we have a notion of order on a set, it is tempting to look for largest ele- 
ments and smallest elements of various types. The most basic type of largest element 
or smallest element is given in the following definition. 


Definition 7.4.5. Let (A,<) be a poset, and let a € A. The element a is a greatest 
element of A if x = a for all x € A. The element a is a least element of A if a X x for 
all x EA. A 


Not every poset has a greatest element or a least element, as we now see. 


Example 7.4.6. The poset (Z,<) has no greatest element or least element. Even 
finite posets need not have greatest elements or least elements. For example, the 
poset in Example 7.4.4 (1) does not have a greatest element; observe that 12 is not 
a greatest element with respect to the relation a|b, because 10 does not divide 12. 
The poset does have a least element, the number 2, because 2 divides all the other 
numbers in the set. © 


We now turn to a definition that is slightly weaker than Definition 7.4.5. The 
following definition generalizes Definition 3.5.4 (1). 


Definition 7.4.7. Let (A, =<) be a poset, and let a € A. The element a is a maximal 
element of A if there is no x € A such that a x x and a £ x. The element a is a 
minimal element of A if there is no x € A such that x < aanda ¥x. A 
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Example 7.4.8. The poset (Z,<) has no maximal element or minimal element. Let 
(A, =) be the poset in Example 7.4.4 (1). The elements 8, 10 and 12 are all maximal 
elements, which shows that maximal elements need not be unique, and also that max- 
imal elements need not be greatest elements. The element 2 is a minimal element, 
which also happens to be a least element. ?) 


Although not every poset has a maximal element or minimal element, the follow- 
ing theorem shows that such elements always exist in finite posets. 


Theorem 7.4.9. Let (A, <) be a poset. Suppose that A is finite. Then A has a maximal 
element and a minimal element. 


Proof. We will prove the existence of maximal elements; the existence of minimal 
elements is similar, and we omit the details. Let n = |A|. We proceed by induction on 
n. If n= 1, then the single element of A is clearly a maximal element. Now assume 
that n > 2. Suppose that the result is true for n — 1. Let w € A, and let B= A — {w}. 
By Exercise 7.4.8 we know that (B,=<) is a poset. Because |B] =n — 1, it follows 
from the inductive hypothesis that there is a maximal element p of B. We now define 
r €A as follows. If p = w, let r = w; if it is not the case that p = w, then let r = p. 
We claim that r is a maximal element of A. There are two cases. First, suppose that 
p =xw. Then r= w. Suppose that there is some y € A such that w x y andw # y. By 
transitivity it follows that p = y, and by antisymmetry it follows that p 4 y. Because 
y #w, then y € B, and we then have a contradiction to the fact that p is a maximal 
element of B. It follows that w is a maximal element of A. Second, suppose that it is 
not the case that p = w. Then r = p. Because p is a maximal element of B, then there 
isnox € B such that p < x and p # x. It follows that there is no x €¢ A= BU {w} such 
that p < x and p # x, and hence p is a maximal element of A. 


We now look at another concept that is related to the idea of an element being 
larger than everything in a collection of elements, and similarly for elements that are 
smaller than others. This new concept turns out to be extremely useful, both in our 
study of lattices in Section 7.5, and, as mentioned briefly in Example 7.4.11 (2), in 
the field of real analysis. 


Definition 7.4.10. Let (A,=<) be a poset, let X C A be a subset and let a € A. The 
element a is an upper bound of X if x = a for all x € X. The element a is a least 
upper bound of X if it is an upper bound of X, and a = z for any other upper bound 
z of X. The element a is a lower bound of X if a = x for all x € X. The element a 
is a greatest lower bound for X if it is a lower bound of X, and w = a for any other 
lower bound w of X. A 


Example 7.4.11. 


(1) Let A be a set. Every subset of the poset (®(A), C) has a greatest lower bound 
and a least upper bound. Let X C P(A). Then X is a family of subsets of A. It follows 
from Theorem 3.4.5 (1) (2) that Upex D is a least upper bound of X, and that pe D 
is a greatest lower bound of X. 
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(2) We start by looking at subsets of the poset (Q, <). Let X = {4, 3, 3, 7 3, eee 
Then X has no upper bound in Q, and hence no least upper bound. This set has many 
lower bounds, for example —17 and 0, and it has a greatest lower bound, which is 5. 
Let Y = {x € Q| 1 <x <3}. Then Y has many upper and lower bounds; it has a least 
upper bound, which is 3, and a greatest lower bound, which is 1. In contrast to the 
set X, which contains its greatest lower bound, the set Y contains neither its greatest 
lower bound nor its least upper bound. Let Z = {x € Q|0 <x < V2}. Then Z has a 
greatest lower bound, which is 0. However, even though Z has many upper bounds 
in Q, for example 2 and 3, the set Z has no least upper bound in Q, which can be 
seen using the fact that \/2 ¢ Q, as was proved in Theorem 2.3.5. 

Now consider the poset (IR, <). Let Z’ = {x € R | 0 < x < V2}. Incontrast to the 
subset Z of Q, which has upper bounds in Q but no least upper bound, the subset Z’ 
of R has a least upper bound in R, which is 2. Indeed, what distinguishes R from 
Q is precisely the fact that in R, if a subset has an upper bound then it must have 
a least upper bound, and similarly for lower bounds. This property of IR, known as 
the Least Upper Bound Property, is crucial in the field of real analysis, where the 
results of calculus are proved rigorously; see any introductory real analysis text, for 
example [Blo11, Section 2.6], for details. 

(3) As shown in Example 7.4.2 (5), the set N with the relation “a|b” is a poset. 
Let a,...,a) € N, for some p € N. Then the greatest common divisor of a@1,...,ap 
is a greatest lower bound of {a1,...,a,)}, and the least common multiple of these 
numbers is a least upper bound of {aj,...,a,}. On the other hand, if X C N is an 
infinite subset, then X will have no upper bound, and hence it will not have a least 
upper bound, though it will have a greatest lower bound (which will still be the 
greatest common divisor of all the elements of X). © 


We see from Example 7.4.11 that not every subset of a poset has a least upper 
bound or a greatest lower bound. The following lemma shows that if a least upper 
bound or a greatest lower bound of a subset exists, then it is unique. 


Lemma 7.4.12. Let (A,=) be a poset, and let X C A be a subset. If X has a least 
upper bound, then it is unique, and if X has a greatest lower bound, then it is unique. 


Proof. Let p,q € A, and suppose that both are least upper bounds of X. By definition 
both p and q are upper bounds for X. Because p is a least upper bound of X, and q is 
an upper bound of X, then p = q by the definition of least upper bounds. Similarly, 
we see that g = p. By antisymmetry, it follows that p = q. A similar argument works 
for greatest lower bounds; we omit the details. 


Because of Lemma 7.4.12, we can refer to “the least upper bound” and “the 
greatest lower bound” of a subset of a poset, whenever a least upper bound and a 
greatest lower bound exist. It is standard to write lub X and glbX to denote the least 
upper bound and the greatest lower bound respectively for a subset X of a poset, 
though we will not need that notation in this book. 

What is the relation between posets and totally ordered sets? Clearly, every totally 
ordered set is a poset. The converse is certainly not true, as seen in Example 7.4.2 (4). 
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However, the following theorem shows that every finite poset can be “expanded” into 
a totally ordered set. 


Theorem 7.4.13. Let (A, =) be a poset. Suppose that A is finite. Then there is a total 
ordering <' on A such that if x < y then x =’ y, for all x,y € A. 


Proof. Let n = |A|. We proceed by induction on n. If n = | the result is trivial. 
Now assume that n > 2. Suppose that the result is true for n — 1. By Theorem 7.4.9 
the poset A has a maximal element, say r € A. Let B = A — {r}. By Exercise 7.4.8 
we know that (B,=<) is a poset. Because |B] = n— 1, it follows from the inductive 
hypothesis that there is a total ordering <” on B such that if x < y then x =” y, for all 
x,y € B. Now define a relation <’ on A as follows. If x,y € B, let x =’ y if and only if 
x <"y. If x €A, let x =’ r. It is left to the reader in Exercise 7.4.9 to show that <’ is 
a total order on A, and that if x = y then x =’ y, for all x,y € A. 


Theorem 7.4.13 states that any finite poset can be given a total ordering that in- 
cludes the original partial ordering; such a total ordering is often referred to as a 
linear ordering of the original poset. A poset can have more than one linear ordering. 
A close look at the proof of Theorem 7.4.13 shows that we actually gave an algorith- 
mic procedure for finding a linear ordering of a given poset. This is not the only (nor 
the best) such algorithm, though it is a very simple one. Such algorithms are useful in 
the theory of posets, and well as in applications of posets to computer science (where 
finding a linear ordering is known as topological sorting); see [Knu73, pp. 258—268] 
for discussion of the latter. 


Example 7.4.14. Let (A,<) be the poset corresponding to the Hasse diagram in 
Figure 7.4.2 (11). We will apply the algorithm in the proof of Theorem 7.4.13 to this 
poset. First, we need to choose a maximal element of A. There are two such elements, 
which are a and c. Let us choose the element a. Then let B = A — {a} = {b,c}. We 
now need a total ordering on B that includes the given partial ordering on B. Such a 
total ordering is quite easy to obtain, given that B has only two elements, and neither 
element is greater than the other in the given partial ordering. Again, there is a choice 
to be made, and we will choose the total ordering <” on B defined by b <” c. We 
now define =’ on A by first letting b <’ c, and then, because a is the chosen maximal 
element of A, letting b <' a and c =’ a. The Hasse diagram of the totally ordered set 
(A, x’) is given in Figure 7.4.2 (v). 

Different choices in the above procedure would have yielded different total or- 
derings on A. For example, if we had chosen the maximal element c instead of a, the 
resulting total ordering would have been b =’ a =" c. ?) 


In Section 7.3 we discussed homomorphisms and isomorphisms of groups, which 
are functions between groups that preserved the group operation. We can similarly 
discuss functions between posets that preserve their basic structures. Our treatment 
here partially follows [Sza63, Section 20]. 


Definition 7.4.15. Let (A,<) and (B, =’) be posets, and let f: A — B be a func- 
tion. The function f is an order homomorphism (also called an order preserving 
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function) if x < y implies f(x) =’ f(y), for all x,y € A. The function f is an order 
isomorphism if it is bijective, and if both f and f—! are order homomorphisms. A 


Two posets are considered essentially the same if there is an order isomorphism 
between them. 

The following useful lemma is a direct consequence of Definition 7.4.15, and we 
omit the proof. 


Lemma 7.4.16. Let (A,=<) and (B,=') be posets, and let f: A — B be a function. 
Then f is an order isomorphism if and only if f is a bijective function, and x = y if 
and only if f(x) =<' f(y) for all x,y € A. 


Example 7.4.17. 


(1) Let ®-(N) denote the family of all finite subsets of N. Then (2p(N),C) is 
a poset. We saw in Example 7.4.2 (3) that (Z,<) is a poset. Let s: 2p(N) — Z be 
defined by s(X) = |X| for all X € ®r(N). It follows from Theorem 6.6.5 (3) that the 
function s is an order homomorphism. The function s is not bijective, however, so it 
is not an order isomorphism. 

(2) Let A = {a,b}, let D = {1,2,3,6} and let = be the relation on D given by alb. 
Then ((A),C) and (D, =) are posets, as seen in Example 7.4.2 (4) (5). Let f: D— 
(A) be defined by f(1) =, and f(2) = {a}, and f(3) = {b} and f(6) = {a,b}. It 
is left to the reader to verify that f is an order isomorphism. 

(3) Observe that (N,=) is a poset. We also know that (N,<) is a poset, as stated 
in Example 7.4.2 (3) . The identity map ly: N — N is then seen to be an order 
homomorphism from the poset (N,=) to the poset (N,<). The function 1 is also 
bijective, and clearly (Iy)7! = ly. However, if we think of the function ly in its 
roles as (1yy)~!, then we observe that this inverse function is not an order homomor- 
phism from (N, <) to (N,=). For example, we observe that 5 < 7, but 1y(5) 4 I (7). 
We therefore see that a bijective order homomorphism need not have its inverse auto- 
matically be an order homomorphism. Hence, the definition of order isomorphism is 
not redundant. (If the reader is familiar with group isomorphisms, as in Section 7.3, 
or with linear maps, then this example may seem rather strange. For both groups 
and vector spaces, if a function is bijective and a homomorphism, then its inverse is 
automatically a homomorphism as well; see Theorem 7.3.10 (2) for the group case. 
Homomorphisms of posets, we now see, are not as well-behaved.) © 


We conclude this section with a nice result about order homomorphisms. To ap- 
preciate this result, recall from Figure 7.4.2 that there are a number of distinct partial 
orderings on a set with 3 elements, and of course there are even larger numbers of 
distinct partial orderings on larger finite sets. However, only one of the partial order- 
ings in Figure 7.4.2 is a total ordering, namely, the one corresponding to the Hasse 
diagram that is a single vertical line. The following theorem says, not surprisingly, 
that a similar result holds for all finite sets. 


Theorem 7.4.18. Let (A,=) be a totally ordered set. Suppose that A is finite. Let 
n= |A|. Then there is an order isomorphism from (A, =) to ({1,2,...,1},<). 
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Proof. We follow [KR83a]. We prove the result by induction on n. When n = | the 
result is trivial. Now assume that n > 2. Suppose that the result holds for n — 1. 

By Theorem 7.4.9 the poset A has a maximal element, say r € A. Let x € A. 
Because = is a total ordering, we know that x = r or r = x. If it were the case that 
r <x, then by hypothesis on r we would know that r = x. Hence x = r. 

Let B = A — {r}. By Exercise 7.4.8 we know that (B,<) is a poset. Because 
|B| = n—1, it follows from the inductive hypothesis that there is an order iso- 
morphism from (B,=) to ({1,2,...,1—1},<), say f: B > {1,2,...,n—1}. Let 
F:A-— {1,2,...,n} be defined by F(x) = f(x) for all x € B, and F(r) =n. 

Because f is bijective, it is straightforward to see that F is bijective as well; we 
omit the details. To see that F is an order isomorphism, it suffices by Lemma 7.4.16 
to show that x = y if and only if F(x) < F(y), for all x,y € A. First, let x,y € B. Then 
x < y if and only if f(x) < f(y) because f is an order isomorphism. Because F (x) = 
f(x) and F(y) = f(y), then x = yif and only if F(x) < F(y). Now let z € B. We know 
that z = r, and we also know that F(z) <n = F(r), because F(z) € {1,2,...,n—1}. 
Hence z = r if and only if F(z) < F(r), because both these statements are true. It 
follows that F is an order isomorphism. 


The analog of Theorem 7.4.18 for infinite sets is not true. For example, as the 
reader is asked to show in Exercise 7.4.16, there is no order isomorphism from the 
totally ordered set (N,<) to the totally ordered (N- ,<), where N~ denotes the set 
of negative integers, even though both sets have the same cardinality. 


Exercises 


Exercise 7.4.1. Is each of the relations given in Exercise 5.1.3 antisymmetric, a 
partial ordering and/or a total ordering? 


Exercise 7.4.2. Is each of the following relations antisymmetric, a partial ordering 
and/or a total ordering? 


(1) Let F be the set of people in France, and let M be the relation on F defined 
by x M y if and only if x eats more cheese annually than y, for all x,y € F. 

(2) Let W be the set of all people who ever lived and ever will live, and let A be 
the relation on W defined by x A y if and only if y is an ancestor of x or if 
y =x, for allx,y EW. 

(3) Let T be the set of all triangles in the plane, and let L be the relation on T 
defined by s Lt if and only if s has area less than or equal to fr, for all triangles 
SteT. 

(4) Let U be the set of current U.S. citizens, and let Z be the relation on U defined 
by xZy if and only if the Social Security number of x is greater than the Social 
Security number of y, for all x,y € U. 


Exercise 7.4.3. [Used in Exercise 7.4.4, Exercise 7.4.15 and Example 7.5.2.] Let A C N 
be a subset, and let = be the relation on A defined by a = bif and only if b = ak for 
some k € N, for all a,b € A. Prove that (A, =<) is a poset. Is (A, =) a totally ordered 
set? 
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Exercise 7.4.4. Draw a Hasse diagram for each of the following posets. 


(1) The set A = {1,2,3,...,15}, and the relation ab. 

(2) The set B = {1,2,3,4,6,8, 12,24}, and the relation a|b. 

(3) The set C = {1,2,4,8, 16,32,64}, and the relation alb. 

(4) The set C = {1,2,4,8, 16,32,64}, and the relation = defined by a = b if and 
only if b = a‘ for some k EN, for all a,b € C. (It was proved in Exercise 7.4.3 
that (C, =) is a poset.) 

(5) The set P({1,2,3}), and the relation C. 


Exercise 7.4.5. [Used in Example 7.4.2.] 


(1) Give an example of a relation on R that is transitive and antisymmetric but 
neither symmetric nor reflexive. 

(2) Let A be a non-empty set, and let R be a relation on A. Suppose that R is 
both symmetric and antisymmetric. Prove that every element of A is related 
at most to itself. 


Exercise 7.4.6. 


(1) Prove that if the poset has a greatest element, then the greatest element is 
unique, and if a poset has a least element, then the least element is unique. 
unique. 

(2) Find an example of a poset that has both a least element and a greatest el- 
ement, an example that has a least element but not a greatest element, an 
example that has a greatest element but not a least element and an example 
that has neither. 


Exercise 7.4.7. Prove that a greatest element of a poset is a maximal element, and 
that a least element of a poset is a minimal element. 


Exercise 7.4.8. [Used in Theorem 7.4.9, Theorem 7.4.13 and Theorem 7.4.18.] Let 
(A, x) be a poset, and let B C A be a subset. The relation = is defined by a subset 
RCAxXA. Then ROB x B defines a relation on B, which can be thought of as the 
restriction of < to B; for convenience, because no confusion arises, we will also 
denote this relation on B by =. Prove that (B, =) is a poset. 


Exercise 7.4.9. [Used in Theorem 7.4.13.] Complete the missing step in the proof of 
Theorem 7.4.13. That is, let <’ be as defined in the proof of the theorem. Prove that 
=’ is a total order on A, and that if x < y then x =’ y, for all x,y € A. 


Exercise 7.4.10. Let A be a non-empty set, and let R be a relation on A. The relation 
R is a quasi-ordering if it is reflexive and transitive. 

Suppose that R is a quasi-ordering. Let ~ be the relation on A defined by x ~ y if 
and only ifx Ry and y Rx, forall x,y EA. 


(1) Prove that ~ is an equivalence relation. 

(2) Let x,y,a,b € A. Prove that if x Ry, andx ~a,andy~ b, thenaR b. 

(3) Form the quotient set A/~, as defined in Definition 5.3.6. Let S be the relation 
on A/~ defined by [x] S |y] if and only if x R y. Prove that S is well-defined. 
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(4) Prove that (A/~,S) is a poset. 


Exercise 7.4.11. Let (A, =<) be a poset. For each X CA, let Prec(X ) be the set defined 
by 
Prec(X) ={w €A|w=<xandw ¥#x forall x € X}. 


Let C,D CA. Prove that Prec(C UD) = Prec(C) MPrec(D). 


Exercise 7.4.12. Let (A, =<) be a poset. Let f: A — (A) be defined by f(x) = {y € 
A|y <x} for allx€ A. 


(1) Letx,z € A. Prove that x = z if and only if f(x) C f(z). 
(2) Prove that f is injective. 


Exercise 7.4.13. Let (A,=) be a poset, let X be a set and let h: X — A be a function. 
Let =’ be the relation on X defined by x =’ y if and only if h(x) < h(y), for all x,y EX. 
Prove that (X, =’) is a poset. 


Exercise 7.4.14. Let (A, =) be a poset, and let X be a set. Let ¥ (XA) be as defined 
in Section 4.5. Let =’ be the relation on ¥(X,A) defined by f =’ g if and only if 
f(x) X g(x) for all x € X, forall f,g € ¥(X,A). Prove that (¥ (X,A), =’) is a poset. 


Exercise 7.4.15. Let < denote the relation a|b on N, and let =’ be the relation on N 
defined by a =’ b if and only if b = a‘ for some k €N, for all a,b € N. (It was proved 
in Exercise 7.4.3 that (N, <’) is a poset.) Prove that the identity map ly: N— Nis an 
order homomorphism from (N, =’) to (N, <), but that it is not an order isomorphism. 


Exercise 7.4.16. [Used in Section 7.4.] Let N~ be the set of negative integers. Prove 
that there is no order isomorphism from the poset (N, <) to the poset (N~,<). 


Exercise 7.4.17. Let (A,=<) and (B,~’) be posets, and let f: A — B be an order 
isomorphism. Prove that if < is a total order, then so is =’. 


Exercise 7.4.18. [Used in Section 3.5.] The Well-Ordering Theorem states that for 
any set A, there is a total ordering on the set A such that every subset of A has a least 
element. Prove that the Well-Ordering Theorem implies the Axiom of Choice (use 
the version given in Theorem 3.5.3). Recall that the Axiom of Choice is not needed 
when there is a specific procedure for selecting elements from sets. 


7.5 Lattices 


In this section we turn our attention to a special type of poset, in which certain least 
upper bounds and greatest lower bounds exist. 


Definition 7.5.1. Let (A, <) be a poset. 


1. Let a,b € A. The join of a and b, denoted aV b, is the least upper bound of 
{a,b}, if the least upper bound exists; the join is not defined if the least upper 
bound does not exist. The meet of a and b, denoted a/b, is the greatest lower 
bound of {a,b}, if the greatest lower bound exists; the meet is not defined if 
the greatest lower bound does not exist. 
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2. The poset (A, =) is a lattice if a/b and aV b exist for alla,b € A. A 


The symbols for meet and join are the same symbols that we used for “and” and 
“or” in Chapter 1. Both usages are quite standard, and no confusion should arise, 
because the context should be clear in every situation. The different uses of the same 
symbols is not entirely coincidental, however, because meet and join play roles anal- 
ogous to “and” and “or,” though the former do not satisfy all the properties of the 
latter. 

Lattices are an extremely useful type of poset. A nice introduction to lattices 
and their applications, including a brief history of lattice theory, is [LP98, Chap- 
ters 1 and 2]; some applications mentioned include probability and boolean algebras 
(the latter, defined in Exercise 7.5.11, are a special type of lattice, and are of use in 
areas such as logic and switching circuits). A classic text on lattices is [Bir48]; an- 
other comprehensive text is [Sz463]. For a combinatorial perspective on lattices see 
[Bog90, Chapter 7], where some lattices related to graphs and partitions are given. 


Example 7.5.2. 


(1) The sets N, Z, Q and R with the relation < are all lattices. We know from 
Example 7.4.2 (3) that these sets with the relation < are all posets. Let x and y be 
two numbers in any one of these sets. If x = y thenx\y=x=yandxVy=x=y; 
if x A y, then x Ay is the smaller of the two numbers, and x V y is the larger. More 
generally, any totally ordered set is a lattice, by the same argument. 

(2) Let A be a set. The poset (P(A), C) is a lattice. If X,Y € P(A), then X AY = 
XY andX VY=XUY. 

(3) As shown in Example 7.4.2 (5), the set N with the relation “a|b” is a poset. 
This poset is a lattice. If a,b € N, then a/b is the greatest common divisor of a and 
b, and a\V b is the least common multiple. 

(4) If a finite poset is represented by a Hasse diagram, we can use the Hasse 
diagram to check whether or not the poset is a lattice. In Figure 7.5.1 (i)(i1) we see 
posets that are lattices. For example, in Part (i) we see that yVz= x and yAz=w. 
On the other hand, the posets in Figure 7.5.1 (iii)(iv) are not lattices. For example, 
in Part (iii) of the figure the elements s and f do not have an upper bound, and hence 
no least upper bound, and therefore no join. In Part (iv) of the figure the elements y 
and z have two upper bounds, but no least upper bound, and therefore no join. A very 
thorough discussion of Hasse diagrams of lattices is given in [Dub64, pp. 9-19]. 

(5) Let = be the relation on N defined by a = b if and only if b= a‘ for some 
k EN, for all a,b € N. It was proved in Exercise 7.4.3 that (N,<) is a poset. This 
poset is not a lattice, however, because meets and joins do not always exist. For 
example, the numbers 2 and 3 have neither a lower bound nor an upper bound, and 
hence neither a greatest lower bound nor a least upper bound. Suppose to the contrary 
that c is an upper bound of {2,3}. It follows that 2 = c and 3 = c, and therefore there 
are k, j € N such that c = 2" and c = 3/. Hence 2* = 3/, which cannot be the case. 
Now suppose that d is a lower bound of {2,3}. Then d =< 2 and d = 3, and therefore 
there are p,q € N such that 2 = d? and 3 = d’, which again cannot happen, because p 
and g are natural numbers. Some meets and joints do exist in this poset, for example 
4A8=2 and4V8 = 64. 0) 
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x a Ss t Ww v 
y z 
b d r 
u v y a 
Ww e p q x 
(i) (i1) (ili) (iv) 
Fig. 7.5.1. 


From Example 7.5.2 (1) and Example 7.4.11 (2), we see that whereas the least 
upper bound and the greatest lower bound of any pair of elements in a lattice must 
exist, the least upper bound and the greatest lower bound of an arbitrary subset of a 
lattice need not exist (though in some cases they do). Exercise 7.5.5 shows that for 
finite lattices there are no such problems. 

The following theorem gives various standard properties of meet and join in lat- 
tices. See [LP98, Section 1.1] for more such properties. 


Theorem 7.5.3. Let (L,=) be a lattice, and let x,y,z € L. 


xAy=<xandx\y=<yandx=<xVyandy=<xVy. 
xA\x=xandxVx=x  (Idempotent Laws). 

. x\y=yAxandxVy=yVx (Commutative Laws). 

. xA(yAz) =(xAy)Azand xV (yVz) =(xVy)Vz_ (Associative Laws). 
. xA(xVy) =xandxV(xA\y)=x (Absorption Laws). 

. x < y ifand only if x \y = x if and only ifxVy =y. 

. fx =<y, thenx\z=<yAzandxVzXyVz 


NAWAWNA 


Proof. We will prove Parts (4) and (5), leaving the rest to the reader in Exercise 7.5.3. 


(4). We will prove that x/A (yA z) = (xAy) Az; the proof that x V (yVz) = (xVy) V 
zis similar, and we omit the details. Let d = x A (y Az). By Part (1) of this theorem 
we know that d = x and d x yAz. Applying Part (1) again, we see that d = y and 
d = z. Because d is a lower bound of x and y, it follows from the definition of meet as 
greatest lower bound that d = x /\y. Similarly, because d is a lower bound of xy and 
z, it follows that d < (x y) Az. Hence xA (yAz) =< (xAy) Az. A similar argument 
shows that (xy) \z =< x/A(yAz); we omit the details. By the antisymmetry of =, 
we deduce that x \ (yA z) = (xAy) Az. 


(5). We will prove that x V (x \y) =x; the proof that x A (x Vy) = x is similar, 
and we omit the details. By the reflexivity of x we know that x = x, and by Part (1) 
of this theorem we know that x \ y = x. Therefore x is an upper bound of x and x/ y, 
and by the definition of join as least upper bound, we deduce that x V (xy) = x. On 
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the other hand, by Part (1) we know that x < x V (x/A y). By the antisymmetry of =, 
we deduce that x V (x Ay) =x. 


We see in Theorem 7.5.3 that some of the standard algebraic properties of ad- 
dition and multiplication of numbers also hold for lattices. However, not all fa- 
miliar properties of addition and multiplication of numbers hold for every lattice, 
for example the Distributive Law. This law does hold for the lattice in Exam- 
ple 7.5.2 (2), as seen from Theorem 3.3.3 (5), but it does not hold in the lattice 
represented by the Hasse diagram in Figure 7.5.1 (ii). In that Hasse diagram, we see 
that b\ (cVd) =b/Aa=b, whereas (bc) V (bAd) =eVe =e. Exercise 7.5.7 gives 
two inequalities related to the Distributive Law that hold in all lattices. 

We started our discussion of posets and lattices in Section 7.4 by stating that 
we are interested in algebraic structures involving order relations rather than binary 
operations (which are discussed in Section 7.1). Though posets truly involve only 
an order relation, in lattices there are two binary operations, namely, meet and join. 
(Indeed, it is because meet and join are binary operations that we prefer the notation 
ab and a\ b rather than the notation glb{a,b} and lub{a,b}, respectively.) The 
binary operations meet and join satisfy certain properties, some of which were given 
in Theorem 7.5.3. As shown in the following theorem, we can in fact reformulate the 
definition of lattices as sets with two binary operations that satisfy certain properties, 
which in turn give rise to the appropriate type of order relation. The basic idea for 
this theorem is Theorem 7.5.3 (6), which expresses the partial ordering relation in 
terms of meet and join. 


Theorem 7.5.4. Let A be a set, and let: Ax A—AandU: Ax A—A be binary 
operations on A. Suppose that 11 and \ satisfy the following properties. Let x,y,z € 


a. xlly=yllxandxUy=yx. 
b. x0 (yz) = Ny) Nz and xU (yUz) = (xUy) Liz. 
ce. xO (xUy) =x and xU (xNy) =x. 


Let = be the relation on A defined by x = y if and only if x.y =x, for all x,y € A. 
Then (A, =) is a lattice, with N and U the meet and join of the lattice, respectively. 


Proof. We follow [Bir48] and [LP98] in part. As a preliminary, we prove the fol- 
lowing two facts: (1) xx =x for all x € A; and (2) xy =~ if and only if 
xUy = y, for all x,y € A. Let x,y,z € A. Using both parts of Property (c), we see 
that xx =xN (xU (xMx)) =x, which proves Fact (1). Suppose that xy =x. Then 
by Properties (a) and (c) we see that xLly = (xMy) Uy = yU (yx) = y, which proves 
one of the implications in Fact (2); a similar argument proves the other implication, 
and we omit the details. 

We now show that (A,=) is a poset. Because xx = x by Fact (1), it follows 
from the definition of = that x = x. Hence = is reflexive. Now suppose that x = y and 
y <z. Then xMy =x and yz = y. By Property (b) we see that xz = (xNy)Nz= 
xO (yz) =xMNy =x. It follows that x = z. Therefore = is transitive. Next, suppose 
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that x < y and y x x. Then xy =x and yl1x = y. It follows from Property (a) that 
x = y. Therefore = is antisymmetric. We conclude that (A, <) is a poset. 

Finally, we show that M and LJ are the meet and join of (A,~=), respectively. It 
will then follow from this fact that meet and join always exist for any two elements 
of A, and hence (A, =) is a lattice. We start with M. Using Property (b) and Fact (1) 
we see that (xy) Ny =x (yMy) =xMy. Hence xy = y. Because xy = yx 
by Property (a), a similar argument shows that x! ly < x. Therefore x!ly is a lower 
bound of {x,y}. Now suppose that z € A is a lower bound of {x,y}. Then z = x and 
z= y, and therefore zx = z and zy = z. By Property (b) we see that z1 (xy) = 
(zx) y = zMy =z. Hence z = (xMy). It follows that xy is the greatest lower 
bound of {x,y}, which means that xy is the meet of x and y. 

We now turn to LI. By Property (c) we know that xM (xLy) = x. Hence x = xy. 
Because x Ll y = yLx by Property (a), a similar argument shows that y < xLly. Hence 
xUy is an upper bound of {x,y}. Now suppose that w € A is an upper bound of {x,y}. 
Then x < w and y = w, and therefore x11 w = x and yllw = y. By Fact (2) we deduce 
that x w = w and yLUw = w. Property (b) then implies that (xy) Uw =xU(yUw) = 
xUw =w. Hence (xUy) Mw = xLy by Fact (2). Therefore xy = w. It follows that 
xLly is the least upper bound of {x,y}, which means that xLly is the join of x and 


y. 


Whereas Theorem 7.5.4 says that it is possible to view lattices as being defined by 
binary operations, which is useful in some approaches to the subject, it is nonetheless 
often useful to view lattices as we did originally, based upon partial orderings. 

In Section 7.4 we discussed order homomorphisms and order isomorphisms of 
posets. Because lattices are posets, we can apply such functions to lattices. Addition- 
ally, there are two other types of functions that are suited to lattices, though not to 
arbitrary posets. 


Definition 7.5.5. Let (L,<) and (M, =’) be lattices, and let f: L — M be a function. 
Let A and V be the meet and join for L, and let \’ and V’ be the meet and join 
for M. The function f is a meet homomorphism if f(x A y) = f(x) A’ f(y) for all 
x,y € L. The function f is a join homomorphism if f(x Vy) = f(x) V’ f(y) for all 
xyeEL. A 


Example 7.5.6. 


(1) The function f: D — (A) in Example 7.4.17 (2) is both a meet homomor- 
phism and a join homomorphism, as the reader can verify. 

(2) Let (L,<) and (M, =’) be the lattices represented by the Hasse diagrams in 
Figure 7.5.1 (i)Gi). Let f: L — M be defined by 


fls) = : ifs=x 


e, otherwise. 


The function f is a meet homomorphism. If s,¢ € L are not both x, then s\t 4 x, and 
hence f(sA\t)=e=eAe= f(s) A f(t); also, we observe that f(x \x) = f(x) =a= 
aNa= f(x) A f(x). The function f is not a join homomorphism, because f(y Vz) = 


7.5 Lattices 285 


f(x) =a, but f(y) V f(z) =e Ve =e. A similar construction yields a function L + M 
that is a join homomorphism but not a meet homomorphism. 

(3) The function s: 2f(N) > Z in Example 7.4.17 (1) is an order homomor- 
phism, as was stated in that example. However, this function is neither a meet homo- 
morphism nor a join homomorphism. For example, let X = {5,7}, and let Y = {7,9}. 
Then, as in Example 7.5.2 (2), we see that X \Y = XNY = {7}, andX VY =XUY= 
{5,7,9}. Hence s(X AY) = 1 and s(X VY) = 3. However, as discussed in Exam- 
ple 7.5.2 (1), we see that s(X) As(¥) =2A2 = 2, and s(X) Vs(Y) =2V2 =2. 
Hence s(X AY) £ s(X) As(Y) and s(X VY) 4 s(X)Vs(Y). % 


We now have four types of functions that we can use with lattices, namely, or- 
der homomorphisms, order isomorphisms, meet homomorphisms and join homo- 
morphisms. How are these different types of functions related? We saw in Exam- 
ple 7.4.17 (1) that a function can be an order homomorphism without being an order 
isomorphism. We saw in Example 7.5.6 (2) that a function can be a meet homomor- 
phism without being a join homomorphism, and vice versa. Parts (1) and (3) of the 
following theorem clarify the relations between the four types of functions. 


Theorem 7.5.7. Let (L, =<) and (M, =’) be lattices, and let f : L— M be a function. 


1. If f is a meet homomorphism or a join homomorphism, then it is an order 
homomorphism. 

2. If f is bijective and a meet (respectively, join) homomorphism, then f—! is a 
meet (respectively, join) homomorphism. 

3. The function f is an order isomorphism if and only if f is bijective and a 
meet homomorphism if and only if f is bijective and a join homomorphism. 


Proof. We will prove Part (1), leaving the rest to the reader in Exercise 7.5.14. 


(1). Suppose that f is a meet homomorphism. Let (A and /‘ denote the meet for 
L and M, respectively. Let x,y € L. Suppose that x < y. Then by Theorem 7.5.3 (6) 
we know that x =x/y. Then f(x) = f(xA y) = f(x) \’ f(y), because f is a meet 
homomorphism. Using Theorem 7.5.3 (6) again, we deduce that f(x) =<’ f(y). It 
follows that f is an order homomorphism. A similar argument works if f is a join 
homomorphism; we omit the details. 


Because of Theorem 7.5.7 (3), we consider two lattices to be essentially the same 
if there is an order isomorphism between them, or, equivalently, if there is a bijective 
meet homomorphism or a bijective join homomorphism between them. 

We conclude this section with a nice result that involves the notion of a fixed 
point of a function, that is, an element taken to itself by the function. Fixed points 
arise in many parts of mathematics, for example the famous Brouwer Fixed Point 
Theorem in topology (see [Nab80, p. 29]), as well as in applications of mathematics 
to economics (see [Deb] or [KR83b, pp. 38-39]). The following theorem gives a 
criterion that guarantees the existence of fixed points for certain functions of lattices 
to themselves. 
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Theorem 7.5.8. Let (L,=<) be a lattice, and let f: L — L be an order homomor- 
phism. Suppose that the least upper bound and greatest lower bound exist for all 
non-empty subsets of L. Then there is some a € L such that f(a) =a. 


Proof. Let C = {x € L|x =< f(x)}. Observe that L is non-empty because it is a poset, 
and all posets are assumed to be non-empty. Let m be the greatest lower bound of 
L, which exists by hypothesis. Then m is a lower bound of L, and therefore m = x 
for all x € L. In particular, we see that m < f(m). It follows that m € C, and so C is 
non-empty. 

Let a be the least upper bound of C. Let x € C. Then a is an upper bound of 
C, and therefore x < a. Using the definition of C and the fact that f is an order 
homomorphism, we deduce that x < f(x) < f(a). It follows that f(a) is an upper 
bound for C. Because a is the least upper bound of C, we deduce that a = f(a). 
Because f is an order homomorphism, it follows that f(a) =< f(f(a)). Hence f(a) € 
C, and therefore f(a) < a, because a is an upper bound of C. By antisymmetry, we 
deduce that f(a) =a. 


Corollary 7.5.9. Let (L,=<) be a lattice, and let f: L — L be an order homomor- 
phism. If L is finite, then there is some a € L such that f(a) =a. 


Proof. This corollary follows immediately from Exercise 7.5.5 and Theorem 7.5.8. 


Theorem 7.5.8 does not necessarily hold for lattices that do not satisfy the addi- 
tional hypothesis concerning least upper bounds and greatest lower bounds. Consider 
the lattice (N,<) and the function f: N — N defined by f(x) =x+1 for allx EN. 
This function is an order isomorphism, and yet there is no a € N such that f(a) = a. 
Of course, arbitrary subsets of N do not necessarily have least upper bounds, so The- 
orem 7.5.8 does not apply. 


Exercises 


Exercise 7.5.1. Which of the posets given in Exercise 7.4.4 are lattices? 


Exercise 7.5.2. Which of the posets represented by Hasse diagrams in Figure 7.5.2 
are lattices? 


Exercise 7.5.3. [Used in Theorem 7.5.3.] Prove Theorem 7.5.3 (1) (2) (3) (6) (7). 


Exercise 7.5.4. Find Hasse diagrams corresponding to all possible distinct lattices 
with five elements. 


Exercise 7.5.5. [Used in Section 7.5 and Corollary 7.5.9.] Let (L, <) be a lattice. Prove 
that if X C Lis a finite subset, then X has a least upper bound and a greatest lower 
bound. Deduce that if L is finite and if X C L is a subset, then X has a least upper 
bound and a greatest lower bound. 


Exercise 7.5.6. Let (L,=<) be a lattice, and let a,b € L. Prove thataA\b =aVvb if 
and only ifa=b. 
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Fig. 7.5.2. 


Exercise 7.5.7. [Used in Section 7.5.] Let (L, =) be a lattice, and let a,b,c € L. Prove 
the following inequalities. 


(1) aA(bVc) =(aAb)V (adc) (Distributive Inequality). 
(2) aV(bAc) <(aVb)A(aVc) (Distributive Inequality). 
(3) Ifa =c, thenaA(bVc) = (aAb)Vc (Modular Inequality). 
(4) Ifa=<c, thenaV (bAc) < (aVb)Ac (Modular Inequality). 


Exercise 7.5.8. [Used in Exercise 7.5.9, Exercise 7.5.11 and Exercise 7.5.12.] Let (L, =) 
be a lattice. Prove that aA (bVc) = (aAb) V (ac) for all a,b,c € A if and only if 
aV (bAc) =(aVb) A (aVc) for all a,b,c € A. 

The lattice (L,<) is distributive if either (and hence both) of these conditions 
holds. 


Exercise 7.5.9. Let (L,<) be a lattice, and let a,b,c € A. Suppose that (L, =) is 
distributive, as defined in Exercise 7.5.8 Prove that ifa\c=bAcandaVc=bVc, 
then a= b. 


Exercise 7.5.10. [Used in Exercise 7.5.11 and Exercise 7.5.12.] Let (L, =<) be a lattice. 
The lattice (L, <) is complemented if it has a least element O and a greatest element 
I such that O ¥ I, and if for each a € L, there is an element a’ € L such that a/Aa' =O 
andaVa' =I. 

Suppose that (L, <) is complemented. For each a € L, is a’ unique? Give a proof 
or a counterexample. 


Exercise 7.5.11. [Used in Section 7.5 and Exercise 7.5.12.] Let (L, =) be a lattice. The 
lattice (L,=) is a boolean algebra if it is distributive and complemented, as defined 
in Exercise 7.5.8 and Exercise 7.5.10 respectively. 

Suppose that (L, =<) is a boolean algebra. Let a,b € L. 


(1) Prove that a’ is unique. 
(2) Prove that (ab)! =a'Vb' and (aVb)! =a' AD’. 
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Exercise 7.5.12. Which of the following lattices are distributive, complemented 
and/or boolean algebras, as defined in Exercise 7.5.8, Exercise 7.5.10 and Exer- 
cise 7.5.11, respectively? 


(1) The lattice in Example 7.5.2 (2). 
(2) The lattice in Example 7.5.2 (3). 
(3) The lattices represented by the Hasse diagrams in Figure 7.5.3. 


x 
a r 
b S Uu 
y u e v x 
c 
y 
9 d 
(i) (1i) (iii) 
Fig. 7.5.3 


Exercise 7.5.13. Let (L,<) and (M,=’) be the lattices represented by the Hasse 
diagrams in Figure 7.5.3 (ii) and Figure 7.5.2 (iii), respectively. Described below are 
various functions L — M. Which of these functions is an order homomorphism, a 
meet homomorphism, a join homomorphism and/or an order isomorphism? 


fla) =f) = fle) =f@ =fle)=1. 


(2) f(a) = f(b) = f(c) = fd) = 1, and fle) = 
(3) f(a) = f(b) = fc) = fle) = 1, and f(d) = 

4) f(6) = fle) = f(d) = 3, and fla) = fle) = 2 

5) a 1,and f(b) =2, and f(c) =3, and f(d) =4, and f(e) = 


Exercise 7.5.14. [Used in Theorem 7.5.7.] Prove Theorem 7.5.7 (2) (3). 


7.6 Counting: Products and Sums 


Some very interesting, and extremely applicable, mathematical questions involve 
counting. Aspects of number theory, probability, graph theory and optimization, 
for example, all use counting arguments. A branch of contemporary mathematics, 
called combinatorics, deals with counting questions in very sophisticated ways. See 
[Bog90] for a very nice treatment of combinatorics at a level appropriate to anyone 
who has finished the present text; see [Rob84] for many applications of counting. 
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A counting problem, in the terminology we have developed so far, is the deter- 
mination of the cardinality of a finite set. The difficulty arises when the elements of 
the set are described, possibly quite indirectly, but are not listed explicitly. Suppose, 
for example, that we want to find the number of integers from | to 20 that are not 
divisible by any of 3, 5 or 13. That is, we want to find the cardinality of the set 


S= {ne N|1<n< 20 and nis not divisible by 3,5 or 13}. 


This problem is trivial, of course, because we list the elements of this set explicitly 
as S = {1,2,4,7,8, 11,14, 16, 17,19}. Hence |S| = 10. Now suppose that we wanted 
to find the number of integers from | to 1,000,000 that are not divisible by any of 
3, 5 or 13. Here it would be a very unpleasant task to list all the elements of the 
set explicitly. We will answer this problem without listing the elements of the set in 
Example 7.6.11 (2), after we have developed some useful techniques. 

In this section and the next we will discuss a few of the most basic ways of fig- 
uring out the cardinalities of finite sets. Our approach is a bit different from many 
standard treatments of the subject—not in the statements of our results, but in our 
approach to proving them. In many texts, such as [Bog90, Chapter 1], the discussion 
of counting starts out by simply stating without proof some basic counting principles 
such as the Product Rule and Sum Rule (which we will discuss shortly). These rules 
are then used both to solve various applied problems, and to yield proofs of mathe- 
matical theorems. An example of such a theorem would be a formula for the number 
of injective functions from one finite set to another, the proof of which is simple if 
we have these basic counting principles at our disposal. 

Our approach is the opposite of what was just described. We are not interested 
in counting problems for their own sake (though they are certainly worthwhile), but 
rather as an interesting and useful application of the ideas developed throughout 
this text. Therefore, instead of hypothesizing the Product Rule and Sum Rule, and 
using counting arguments in our proofs, we will formulate ideas about counting in 
terms of our familiar notions of sets, functions and relations. In particular, we will 
prove the Product Rule and Sum Rule, as well as other results, by relating these 
topics to concepts such as injective functions from one finite set to another, and 
using what we have previously learned. Some of our proofs might therefore appear 
more cumbersome than in other texts, and not focused on counting per se—both of 
which are true, but reasonable given our goals. 

We start our discussion of counting, as do many texts, with the “Product Rule.” 
A typical informal statement of this result is as follows; we use the formulation in 
[Rob84, Chapter 2]. 


Fact 7.6.1 (Product Rule). [f something can happen in n ways, and no matter how 
the first thing happens, a second thing can happen in m ways, then the two things 
together can happen inn-m ways. 


Some simple examples of the use of this rule are as follows. 
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Example 7.6.2. 


(1) Fred has seven shirts and five pairs of pants. How many ways can Fred choose 
a shirt/pants pair (assuming that Fred does not care whether his shirt and pants 
match)? By the Product Rule, there are 7-5 = 35 ways. 

(2) A committee of 6 people wants to select a chair and a vice-chair. How many 
ways can this happen, assuming that no person can simultaneously hold both po- 
sitions? If the committee first chooses the chair, there are 6 choices. For each of 
these choices, there are then 5 choices for vice-chair. By the Product Rule, there are 
6-5 = 30 choices for the two positions. If the committee chose the vice-chair first, 
the total number of choices for chair and vice-chair would still be 30. Observe that 
the collections of “second things” that can happen are not disjoint. 

The Product Rule can be generalized to any finite number of things happening. 
For example, suppose that the above committee decided to choose not just a chair 
and vice-chair but also a treasurer, again stipulating that no person can hold more 
than one position. By reasoning as above, we deduce that there are 6-5-4 = 120 
choices. © 


Although the Product Rule is often stated in term of numbers of “ways that things 
can happen,” and for practical problem solving that way of formulating it is very 
useful, the expression “ways that things can happen” is not entirely rigorous, and 
does not directly fit into our framework of sets, functions, relations and the like. 
Fortunately, we can reformulate the Product Rule in terms of cardinalities of finite 
sets. To simplify matters we will restrict our attention to the product of two “choices,” 
because it is all we will need later on. 

Observe that in Example 7.6.2 (1), the choice of the “second thing that happens” 
is independent of the choice of the “first thing,’ whereas in Part (2) of the example 
the second choice is not independent of the first. The former situation is a special 
case of the latter, but it is convenient to deal with this special case first, as we do 
in the following theorem. The proof of this theorem makes use of an important fact 
about the integers, namely, the Division Algorithm, which is stated as Theorem A.5 
in the Appendix. 


Theorem 7.6.3. Let A and B be sets. Suppose that A and B are finite. Then A x B is 
finite, and |A x B| = |A|-|Bl. 


Proof. If A or B is empty, then so is A x B, and the result is trivial. Now suppose that 
A and B are both non-empty. Let n = |A| and p = |B|, and let f: A — {1,...,} and 
g: B= {1,...,p} be bijective functions. Let h: A x B > {1,...,np} be defined by 
h((a,b)) = (f(a) — 1)p+g(b) for all (a,b) € A x B. 

Let x € {1,...,np}. By the Division Algorithm (Theorem A.5 in the Appendix) 
there are g,r € Z such that x = pqg+rand0 <r < p. Because 1 <x < np, it follows 
that 0 < q <n. There are now two cases. First, suppose that r £ 0. Then g £n. By 
the surjectivity of f and g, there are a € A and b € B such that f(a) = g+1 and 
g(b) =r. Then hA((a,b)) = ((¢+1)—1)p+r=pqtr=x. Next, suppose that r= 0. 
Then g # 0. By the surjectivity of f and g, there are m € A and n € B such that 
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f(m) =q and g(n) = p. Then h((m,n)) = (¢q—1)p+p = pq+0 =x. It follows that 
his surjective. 

Let (a,b),(c,d) € A x B, and suppose that h((a,b)) = A((c,d)). Then (f(a) — 
I)p-+a(b) = (F(c)— p+ g(a). Hence (f(a) — f(c)]p +[g(b) — g(d)] = 0. Observe 
that 0 < |g(b) — g(d)| < p. Because 0-p+0 =0, we can see the uniqueness part of 
the Division Algorithm to deduce that f(a) — f(c) =0 and g(b) — g(d) = 0. Hence 
f(a) = f(c) and g(b) = g(d). By the injectivity of f and g, we see that (a,b) = (c,d). 
It follows h is injective. 

Because h is bijective, it follows that |A x B] =np = |A|- |B]. 


The proof of Theorem 7.6.3 might appear to the reader to be needlessly compli- 
cated, given that the result being proved is intuitively simple. It is, in fact, possible 
to prove this theorem without using the Division Algorithm, as is left to the reader 
in Exercise 7.6.5. Avoiding the Division Algorithm yields somewhat simpler proofs 
than the one given above, but these simpler proofs have the disadvantage of not giv- 
ing an explicit bijective function h: A x B > {1,...,|A|-|Bl}. 

We are now ready for the general case of the Product Rule, which allows for the 
second choices to depend upon the first choice. 


Theorem 7.6.4 (Product Rule). Let A be a set, and let {Ba} <4 be a family of sets 
indexed by A. Suppose that A is finite, that Bg is finite for alla € A and that there is 
a set B such that B ~ B, for alla € A. Let X = {(a,b)|a€ A and b € By}. Then X 
is finite, and |X| = |A|- |B]. 


Proof. By hypothesis there is a bijective function gy: Bz — B for each a € A. Let 
@®: X — AxB be defined by B((a,b)) = (a,ga(b)) for all (a,b) € X. It is left to 
the reader in Exercise 7.6.6 to show that ® is bijective. It follows that X ~ A x B. 
We now use Theorem 7.6.3 and Corollary 6.6.3 to deduce that X is finite, and that 
X| = |Ax B| =|4]-[BI. 


The other standard counting rule given in introductory treatments of combi- 
natorics is the “Sum Rule.” A typical informal statement of this result, also from 
[Rob84, Chapter 2], is as follows. 


Fact 7.6.5 (Sum Rule). Jf one event can occur in n ways and a second event in 
m (different) ways, then there are n+ m ways in which either the first event or the 
second event can occur (but not both). 


Observe that the Product Rule, which is about multiplication, involves “and” 
situations, whereas the Sum Rule, which is about addition, involves “or” situations, 
though the meaning here is exclusive “or,” rather than the inclusive “or” regularly 
used by mathematicians (we will discuss the inclusive case shortly). Some simple 
examples of the use of the Sum Rule are as follows. 


Example 7.6.6. 


(1) Murkstown High School has 120 juniors and 95 seniors. The principal has 
to pick one junior or one senior to represent the school at a conference. How many 
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choices are there? Because we may assume that no student is simultaneously a junior 
and a senior, by the Sum Rule there are 120+ 95 = 215 choices. 

(2) Every resident on planet Blort has either just a first name, or both a first 
name and a last name. These names must be chosen from a list of 17 acceptable 
choices. How many differently named Blortians can there be? For those Blortians 
with only one name, there are 17 possibilities. For those with two names, by the 
Product Rule there are 17-17 = 289 possibilities. By the Sum Rule, there can be a 
total of 17+ 289 = 306 differently named Blortians. © 


Similarly to the Product Rule, the Sum Rule can also be stated in terms of car- 
dinalities of finite sets, as we do in Part (2) of the following theorem. Part (3) of the 
theorem deals with the inclusive “or” situation; the intuitive idea is that we should 
not double count the elements in the intersection of the two given sets. 


Theorem 7.6.7 (Sum Rule). Let A and B be sets. Suppose that A and B are finite. 


1. The sets AUB and ANB are finite. 
2. IfA and B are disjoint, then |AU B| = |A|+|B|. 
3. |AUB| = |A|+|B|—|ANBI. 


Proof. 


(1). The fact that AN B is finite follows immediately from Theorem 6.6.5 (1), 
because AM B CA. The fact that A U B 1s finite is shown in Exercise 6.6.1. 


(2). This part is a special case of Part (3) of the theorem, which is proved below. 


(3). Viewing ANB as a subset of B, it follows from Theorem 6.6.5 (2) that |B] = 
[AN B| + |B —(ANB)|. Viewing A as a subset of AUB, we see that |A UB] = |A| + 
|(AUB) —A|. Also, we know by Exercise 3.3.9 that (AUB) —A = B— (ANB). Then 


|AUB| = |A|+|(AUB) —A] = |A] + |B— (ANMB)| = |A| +|B|—|AN BI. 


Example 7.6.8. Hicksville has two radio stations, which are WSNF that plays non- 
stop disco, and WRNG that plays only Wagner’s operas. The stations poll 20 people, 
and find that 15 listen to WSNF, 11 listen to WRNG, and 9 listen to both stations. 
From these data we can figure out how many people listen to at least one station, and 
how many listen to neither. Let A be the set of those people surveyed who listen to 
WSNF, and let B denote those who listen to WRNG. Then |A| = 15, and |B] = 11, 
and |A™B| = 9. By Theorem 7.6.7 (3) we see that |A UB| = |A| + |B] — |ANB| = 
15+ 11—9=17. Therefore 17 people listen to at least one station, and hence 3 
listen to neither. © 


Theorem 7.6.7 can be generalized to the union of finitely many finite sets, rather 
than just two sets, as seen in the following theorem. Recall the definition of a family 
of sets being pairwise disjoint, given in Definition 3.5.1. Part (3) of this theorem is 
often called the “principle of inclusion-exclusion,” and it has many applications in 
combinatorics. See [Rob84, Chapter 6] for various applications, and [Bog90, Sec- 
tion 3.1] for an interesting reformulation of the statement of this principle. 
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Theorem 7.6.9. Let A,,...,A, be sets for some n € N. Suppose that A,,...,An are 
finite. 


I. The set Ay U...UAp is finite, and if {r1,...,re} C {1,...,n}, then Ay, N---M 
A,, is finite. 
2. IfA\,...,An are pairwise disjoint, then |A, U---UA,| = |A1|+---+ |An|. 
3. n 
|A; U-+-UAn| = ¥ Ail = » |AiNAj| + > IA;NA;NAg| 
i=1 l<i<j<n 1<i<j<k<n 


met (=1) ALN NAD 


ip|- 


=Y(-yet YO Ayn. 
= 


Pp 1Sij <+++<ip<n 
Proof. 


(1). The fact that Ay U...UA, is finite is proved by induction on n, using The- 
orem 7.6.7 (1); the details are left to the reader. If {r1,...,7%} C {1,...,n}, then 
Ay, ++» OA; CA;,, and hence A;,---A,, is finite by Theorem 6.6.5 (1). 


(2). This part is a special case of Part (3) of the theorem, which is proved below. 


(3). We prove this result by induction on n. If n = 1, both sides of the equation 
we need to prove are just |A;|, so the result is true. Now suppose that the result is 
true for n — 1. Making use of Theorem 7.6.7 (3), Theorem 3.4.5 (3) and the inductive 
hypothesis we see that 


|Ay U-+UAy| = |(Ay U-+ UAp_1) UAn| 
= |A, U---UAy_4] + |An| — (Ar U- ++ An_1) NAD| 
= |A,|+|A1U---UAp—-1] — |(A1N An) U-++U (An-1 9 An)| 
n—1 
=|A,| {Ei = a |A;NAj|+ yy IA;NA;NAg| 


i=l 1<i<j<n-l 1<i<j<k<n-1 


eo (<1 Ay ak Avail 


n—-1 
{Flaine ~ 3s |(A;N An) N(AjNAz)| 


1<i<j<n-1 


fure(-)" Ginnie ndat 


n—1 
=|A,| +¥ Al — Y |AnAl+ Yo lAinAj;nAg| 
i=l 


1<i<j<n-1 1<i<j<k<n-1 


ae es) meal M-++MAn—1| 
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n—1 
—P} ANA | + YY |AINA;NA,| 
i=1 


i= 1<i<j<n-1 


meet (H1) ALM An-1 NAal 


= 2 Mail -— yY lain + Yo Ainaynagl 


1l<i<j<n 1<i<j<k<n 


Ho (=I) YALN Ad 


Corollary 7.6.10. Let X be a set, and let A,,...,An CX for some n € N. Suppose 
that A is finite. Then 


n 
IX—(41U--UAn)|=IKI- Yad + YD [Ain] 
i=1 


i= 1<i<j<n 


— Ye |AINA;NAg| +--+ (-1)"|A1N-- Ay. 


1<i<j<k<n 


Proof. This corollary follows from Theorem 6.6.5 (2) and Theorem 7.6.9 (3). 
Example 7.6.11. 


(1) A class of 30 students was surveyed to find out how many students liked 
bananas, pickles and/or ice cream. The survey showed that 11 liked bananas, 16 liked 
pickles, 17 liked ice cream, 5 liked both bananas and pickles, 4 liked both bananas 
and ice cream, 8 liked both pickles and ice cream, and everyone in the class liked at 
least one of these foods. The survey forgot to ask how many students liked all three 
of the foods, but we can figure that out from the given data. Let B, P and J denote the 
sets of students who like bananas, pickles and ice cream, respectively. The survey 
then says that |B| = 11, and |P| = 16, and |/| = 17, and |BN P| =5, and |BN/| =4, 
and |PMJ| = 8, and |BUPU1| = 30. By Theorem 7.6.9 (3) we see that 


|IBUPU]| = (|BJ + |P|+|Z|) -—(BOP|+|Bol|+|Po/|)+|BNPoO, 


which yields 


30 = (114+ 164 17) — (54+ 44 8)+|BNPnI|. 


Hence |BN PN1| = 3, which is the number of students who like all three foods. 

(2) We can now solve a problem stated at the beginning of this section, which is 
to find the number of integers from | to 1,000,000 that are not divisible by any of 3, 
5 or 13. Let 


X = {nE€N|1 <n< 1,000,000}, 

B3 = {n € X | nis divisible by 3}, 

Bs = {n € X | nis divisible by 5}, 
By3 = {n €X | nis divisible by 13}. 
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We wish to find |X — (B3 UBs UBj3)|, which we will do by Corollary 7.6.10. To 
find |B3|, we observe that every third integer is divisible by 3, so that the number of 
integers from | to 1,000,000 that are divisible by 3 will be the greatest integer less 
than or equal to —_ Hence |B3| = 333,333. Similarly we can see that |Bs| = 
200,000 and |B)3| = 76,923. Next, an integer will be in B31 Bs if and only if it is 
divisible by both 3 and 5, which is equivalent to being divisible by 15. Therefore 
|B; Bs| will be the greatest integer less than or equal to Eg 000 which is 66,666. 
Similarly we can see that |B3M Bi3| = 25,641, that |Bs 7 Bi3| = 15,384 and that 
|B3 1 Bs By3| = 5,128. By Corollary 7.6.10 we see that 


|X —(B3 UBs UB\3)| 
= |X| —(|B3|+ [Bs] + [Bis 
+ (|B39 Bs|+|B39Bi3|+|Bs 9 Bi3|) —|B3N B59 Bi3| 
= 1,000,000 — (333,333 + 200,000 + 76,923) 
+ (66,666 + 25,641 + 15,384) — 5,128 
= 492,307. » 


Another consequence of Theorem 7.6.9 is the following result, the statement of 
which may seem obvious, but is worth proving, because we will use it in the next 
section. 


Corollary 7.6.12. Let A be a set, and let ~ be an equivalence relation on A. Suppose 
that A is finite, and that all the equivalence classes of A with respect to ~ have the 
same cardinality. If N is the number of equivalence classes, and S is the number of 
elements in each equivalence class, then |A|=N-S. 


Proof. Let A\,...,Ay be the equivalence classes of A with respect to ~. Then |A;| = 
S for all i € {1,...,N}. By Theorem 5.3.4 we know that Aj,...,Ay are pairwise 
disjoint, and that A = A; U---U Ay. It now follows from Theorem 7.6.9 (2) that 
JA] = |Ai|+---+]An|=N-S. 


Exercises 


Exercise 7.6.1. Murray has 231 compact disks. He wants to lend one disk to his 
father and one to his mother. How many ways can he do this? 


Exercise 7.6.2. Bonesville has 1000 residents. Explain why at least two of them 
must have the same initials, if they use only their first names and last names, and if 
they use letters only from the English alphabet. If they use middle initials as well, 
must it be the case that two residents have the same initials? (Assume that every 
resident has precisely one middle name.) 


Exercise 7.6.3. A cheese factory labels each of its product with a code that has two 
letters and one single-digit number. The codes must start with either the letter G or 
B. How many possible codes are there? 
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Exercise 7.6.4. The first grade and second grade students at the Blabbertown Ele- 
mentary School decide to send a delegation to the school principal to complain about 
the school lunches. The delegation is to have either two second graders, or one sec- 
ond grader and one first grader. There are 23 first graders and 27 second graders. 
How many possible delegations are there? 


Exercise 7.6.5. [Used in Section 7.6.] The goal of this exercise is to give proofs of 
Theorem 7.6.3 that do not use the Division Algorithm. 


(1) Prove Theorem 7.6.3 using induction on |A| and Theorem 7.6.7 (2). 

(2) Prove Theorem 7.6.3 using Theorem 7.6.9 (2); note that the proof of that 
theorem uses induction, and it is therefore not surprising that induction is not 
needed in this part of the exercise. 


Exercise 7.6.6. [Used in Theorem 7.6.4.] Prove that the function ® defined in the 
proof of Theorem 7.6.4 is bijective. 


Exercise 7.6.7. A pair of new parents decide to test 10 different brands of diapers on 
their newborn baby. They find that 7 brands leak, 5 brands do not stay on properly, 
and 4 brands both leak and do not stay on properly. 


(1) How many brands have at least one of the problems? 
(2) How many brands have neither problem? 


Exercise 7.6.8. A laboratory study of 50 rabbits showed that 29 liked carrots, 18 
liked lettuce, 27 liked bratwurst, 9 liked both carrots and lettuce, 16 liked both carrots 
and bratwurst, 8 liked both lettuce and bratwurst, and 47 liked at least one of the three 
foods. 


(1) How many rabbits liked none of the three foods? 
(2) How many rabbits liked all three of the foods? 


Exercise 7.6.9. A new drug was tested on 40 people to see if it cured any or all 
of dandruff, ingrown toenails and halitosis. The result of the test showed that 13 
people were cured of dandruff, 27 were cured of ingrown toenails, 23 were cured of 
halitosis, 10 were cured of dandruff and ingrown toenails, 8 were cured of dandruff 
and halitosis, 16 were cured of ingrown toenails and halitosis, and 7 were cured of 
all three problems. How many people were not cured of anything? 


Exercise 7.6.10. A newspaper report claims that a survey of 100 computer hackers 
showed that 36 read Geek Magazine, 56 read Nerd Newsletter, 38 read Wonk Weekly, 
11 read Geek and Nerd, 10 read Geek and Wonk, 18 read Nerd and Wonk, 5 read 
all three, and 7 read none. A hacker who read the newspaper article doubted that the 
purported survey was actually taken. Was she right? 


Exercise 7.6.11. Find the number of integers from | to 100,000 that are not divisible 
by any of 2, 5, 11 or 67. 


Exercise 7.6.12. [Used in Theorem 7.7.12.] Let / be a non-empty set, and let {A;},-; 
be a family of sets indexed by /. Suppose that / is finite, and that A; is finite for all 
i € I. Show that Theorem 7.6.9 (3) can be rewritten as 
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7.7 Counting: Permutations and Combinations 


In this section we are concerned with problems involving the choice of some objects 
out of a larger collection of objects, for example choosing cards out of a deck, or peo- 
ple out of a classroom. In some problems the order of choosing matters, for example 
in choosing a president, vice-president and secretary for a three-person committee, 
while in other problems order does not matter, for example choosing a five-card 
poker hand out of a deck of cards. As in the previous section, the material here is 
quite standard, but our approach is a bit less so. For a standard discussion of these 
topics see [Bog90] and [Rob84]. 

We start with choosing where the order of choosing matters; we have three types 
of problems of this sort. First, we saw in Example 7.6.2 (2) an example of choosing 
2 people out of 6 where order matters. Second, suppose that the same six-person 
committee decides to select someone to stuff envelopes and someone to make coffee; 
the same person could fill both of these new positions. How many ways could these 
two positions be filled? Once again by the Product Rule there are 6-6 = 36 choices 
for the two positions. Finally, suppose that the members of the committee decide to 
line up for a group photograph. How many ways can this happen? Here we would 
have to use the Product Rule repeatedly, which seems correct informally, though it 
would take proof by induction to be rigorous. There are 6 choices for the person on 
the left, then 5 choices for the person next to her, then 4 choices after that and so on. 
All told, there are 6-5-4-3-2-1 = 720 possibilities. 

The general formulas for solving the above three types of problems are as fol- 
lows. In two of the following formulas we make use of factorials, which were dis- 
cussed in Example 6.4.4 (1). That example defined n! for all n € N, though it did 
not apply to n = 0. For convenience we define 0! = 1, which might seem strange to 
the reader who has not encountered it previously, but it works out very nicely, and it 
allows us to avoid some special cases in the statements of theorems. Recall also the 
formula (n+ 1)! = (n+ 1)n! for alln EN. 


Fact 7.7.1 (Counting Rules—Permutations). Let k,n © NU {0}. Suppose that 0 < 
k<n. 


1. The number of ways of choosing k objects out of n objects, where order mat- 
ters and where each object can be chosen more than once, is nk, 

2. The number of ways of choosing k objects out of n objects, where order mat- 
ters and where each object can be chosen only once, is moor 

3. The number of ways of arranging n objects, where order matters, is n!. 


Part (2) of Counting Rules—Permutations leads us to the following definition. 
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Definition 7.7.2. Let k,n € NU {0}. Suppose that 0 < k <n. The number of per- 
mutations of n elements taken k at a time, denoted P(n,k), is defined by P(n,k) = 


n! 
@-Hi ; A 


There are other common notations for the number of permutations of 1 elements 
taken k at a time, for example ,,P,. 


Example 7.7.3. 


(1) The license plates in a certain state have 7 letters. How many different license 
plates can be made if all letters are allowed? We need to choose 7 letters out of 26, 
where the order of selection matters and where each letter can be chosen more than 
once. Because there are 26 letters, by Part (1) of Counting Rules—Permutations we 
know that there are 26’ = 8,031,810,176 possible license plates. 

(2) A 10-person board wishes to select an executive committee consisting of a 
chair, a vice-chair and a secretary; no person may fill more than one of these po- 
sitions. How many possible executive committees are there? We need to choose 
3 people out of 10, where the order of selection matters and where each person 
can be chosen only once. By Part (2) of Counting Rules—Permutations there are 
P(10,3) = TOo3y = 720 possibilities. 

(3) Four women and three men go to the theater together, and all sit in a row. 
How many ways can they be seated if the three men want to sit together in the three 
seats closest to the aisle? Here we need to use the Product Rule from the previous 
section as well as Part (3) of Counting Rules—Permutations. By the latter, there are 
4! ways for the women to be seated, and there are 3! ways for the men to be seated. 
By the Product Rule, there are a total of 4!-3! = 24-6 = 144 possible seatings. © 


Counting Rules—Permutations are usually justified by repeated application of 
the Product Rule (via an often unstated use of proof by induction), but, as was the 
case with “ways that things can happen,” the notion of “choosing objects” is not 
entirely rigorous, and does not directly fit into our framework of sets, functions, 
relations and the like. We can reformulate the concept of choosing objects, where 
order matters, in terms of sets of functions. Recall the notation ¥ (A,B), (A,B) and 
B(A,B) from Section 4.5. 

Let k,n € NU {0}. Suppose that we want to find the number of ways of choosing 
k things out of n where order matters, and where each object can be chosen more than 
once. Let B be a finite set such that |B] =n. Then we want to identify an element of 
B as the first chosen element, and an element of B as the second chosen element and 
so on ending with an element of B as the k-th chosen element. We could express such 
a collection of choices as a function f: {1,...,k} — B. Hence, the collection of all 
possible ways of choosing k things out of n where order matters, and where each 
object can be chosen more than once, would be ¥({1,...,k},B), and the number of 
ways of choosing is | F({1,...,k},B)|. Now let A be a finite set such that |A| =k. 
Then A ~ {1,...,k} by Corollary 6.6.3, and therefore by Lemma 4.5.3 we see that 
F ({1,...,k},B) ~ £ (A,B), which in turn implies that | F ({1,...,4},B)| =|F(A,B)|. 
This last number is what we need to compute to solve our original counting problem. 
Similarly, to find the number of ways of choosing k things out of n where order 
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matters, and where each object can be chosen only once, we need to find |1(A,B)|; 
to find the number of ways of arranging n things, where order matters, we need to 
find |B(A,B)| for the special case where |A| = |B|. The values of | F(A,B)|, |7(A,B)| 
and |8(A,B)| depend only upon k = |A| and n = |B|, and not on the particular choice 
of the sets A and B, as can be seen by using Lemma 4.5.3 and Exercise 4.5.9. The 
following theorem gives formulas for |¥(A,B)|, |Z(A,B)| and |B(A,B)| in terms of 
|A| and |B]. 


Theorem 7.7.4 (Counting Rules—Permutations). Let A and B be sets. Suppose 
that A and B are finite. 


I. The set ¥ (A,B) is finite. If A= 0 and B = 0, then |F (A,B)| = 1. IfA £9 or 


BO, then |F(A,B)| = ley 
2. The set I(A,B) i. |1(A,B)| = 
(A.B) = qatar 
3. The set B(A,B) i |B(A,B)| = 0. Tf |A| = |B 
|B(A, B)| = |BI!. 
Proof. 


(1). First, suppose that A = @ and B = 0. Then ¥ (A,B) = {0}, as remarked in 
Example 4.5.2 (1), and hence ¥ (A,B) is finite, and |¥ (A, B)| = 1. 

Second, A = 0 or B = 0. If A = 0, then once again ¥ (A,B) = {0}, and therefore 
F (A,B) is finite, and |F(A,B)| = 1 = |B|° = [Bl Now suppose that A 4 0. Then 
|A| > 1. For convenience, let k = |A| and n = |B|. We proceed by induction on k. 
Suppose first that k = 1, and that n € N is any number. It follows from Exercise 4.5.2 
that ¥ (A,B) ~ B, and therefore by Corollary 6.6.3 we see that |¥ (A,B)| = |B] =n = 
ee 

Now assume that the result holds for some k € N, and for all n € NU {0}. Let 
m € NU {0}. We will show the result for k+ 1 and m. If m= 0, then B = @, and 
because A 4 @, then ¥ (A,B) = 0 by Example 4.5.2 (1), and so ¥ (A, B) is finite, and 
|F (A, B)| = 0 = m**!. Now suppose that m > 1. Leta € A andb € B, and let F,, = 
{f € F(A,B) | f(a) = b}. By Exercise 4.5.10 (1) we see that F,, ~ F(A — {a},B), 
and therefore |F,,,| = |#(A—{a},B)|. Because |A — {a}| =k, it follows from the 
inductive hypothesis that | F(A — {a},B)| = m*. Therefore |F,,,»| =m". Observe that 
F (A,B) = Ucer Fa,c and that the family of sets {Fac} cp is pairwise disjoint. It then 
follows from Theorem 7.6.9 (2) that 


|F(A,B)| = ¥ Fac a m-m* mk+1, 
ccB ccB 
(2). Left to the reader in Exercise 7.7.19. 
(3). First suppose that |A| 4 |B|. Then by Corollary 6.6.3 we see that A % B. 
Hence there is no bijective function A — B, and therefore B(A,B) = 0, which implies 


that |B(A,B)| = 0. Now suppose that |A| = |B]. It follows from Exercise 6.6.4 that 
B(A,B) = i B). By Part (2) of this theorem we deduce that |B(A,B)| = |/(A,B)| = 
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We now turn to problems where the order of the chosen objects does not matter. 
We have two types of problems. First, you choose five cards out of a deck of cards. 
How many ways can this happen? Second, suppose that you go to a shoe store, and 
they have six pairs in your size. You might buy anywhere from none of the pairs to 
all six of them. How many choices can you make? We cannot solve these problems 
by direct application of the Sum Rule and Product Rule (though it is possible to do 
so indirectly); instead, we use the following formulas. 


Fact 7.7.5 (Counting Rules—Combinations). Let n € NU {0}. 


I. Letk © NU {0}. Suppose that 0 < k <n. The number of ways of choosing k 
objects out of n objects, where order does not matter and where each object 
can be chosen only once, is aoe 

2. The number of ways of choosing an unspecified number of objects out of n 
objects, where order does not matter and where each object can be chosen 


only once, is 2”. 


The formula given in Part (1) of Counting Rules—Combinations turns out to be 
useful in many parts of mathematics, not only in simple counting problems. 


Definition 7.7.6. Let n € NU {0}, and let k € Z. The number of combinations of 
elements taken k at a time, denoted Ga , 18 defined by 


(= Weep fO<k<n 
k 0, ifk<Oork >n. 


The number (‘)) is called the binomial coefficient of n and k. A 


Other common notations for (7) are C(n,k) and ,Cy. Observe that if k,n € NU{O} 
and 0 <k <n, then (7) = ae). Although the formula for (/!) contains a fraction in 
its definition, it will always be the case that 4) is an integer, because by Theo- 
rem 7.7.10 it is equal to the number of elements of a certain set. We will shortly see 


why the term “binomial coefficient” is used. 


Example 7.7.7. 


(1) The Portland Society annual meeting has 11 people from Portland, Maine, 
and 9 people from Portland, Oregon. The meeting needs to elect a five-person steer- 
ing committee. One faction at the meeting wants to allow any five people to be 
elected, while the other faction wants to have either 3 Mainers and 2 Oregonians, 
or vice versa. How many possible committees could be elected by each of these 
methods? 

For the first method, because there are a total of 20 people at the meeting, and 
because the order of the members of the committee does not matter, by Part (1) of 
the Combinations Rule there are (2) = 15,504 possible committees. 

The second method has two exclusive cases, and by the Sum Rule we will add 
the number of possibilities for the two cases. First, suppose that the committee has 3 
Mainers and 2 Oregonians. Then there are (3) possible choices of the Maine mem- 


bers of the committee, and for each of these choices, there are (5) possible choices 
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for the Oregon members. By the Product Rule there are ‘ey (5) possible steering 


committees with 3 Mainers and 2 Oregonians. Similarly, there are () . (3) possi- 
ble steering committees with 2 Mainers and 3 Oregonians. Combining the two cases 
implies that there are (',) - (3) + (/1) - (3) = 165-36 + 55-84 = 10,560 possible 
committees. 

(2) You pass by a pizza shop that advertises that it has over 1000 varieties of 
pizza, and you want to determine whether this is false advertising. All pizzas in this 
shop have cheese, and they may have any combination of up to 10 toppings, for 
example pepperoni, broccoli, mushrooms, etc. Any type of pizza corresponds to a 
choice of toppings, which is a choice of anywhere from 0 to 10 of the 10 toppings 
(choosing 0 toppings corresponds to a plain cheese pizza). By Part (2) of the Combi- 
nations Rule we see that the power set of a 10-element set has 2! = 1024 elements. 
Therefore there are indeed over 1000 varieties of pizza (that some of them might be 
unpalatable is another matter). 

(3) An important use of counting techniques is to compute probabilities. Al- 
though the computation of probabilities can be quite complicated, and is the subject 
of its own branch of mathematics (see [Pit93] or [Ros10] for introductory proba- 
bility), it is possible to compute probabilities in some elementary cases by using 
binomial coefficients. When a number of distinct events can occur with equal like- 
lihood, then the probability of an event is the ratio of the number of ways the event 
can occur to the number of ways all possibilities can occur. For example, we will cal- 
culate the probability for a flush in five-card poker, which means that a player draws 
five cards from a deck of cards, and all the cards turn out to be from the same suit. 
Because the order of cards does not matter, the total number of different five-card 
hands is (%) = 2,598,960. To compute the number of possible flushes, we observe 
that there are four suits in a deck of cards, and for each suit we need to choose 5 
cards out of the 13 cards in the suit. Using the Product Rule, the number of flushes is 
therefore 4- =) = 5148. The probability of a flush is therefore ate = 0.00198. 
The probabilities of other poker hands can be computed similarly. 

(4) Probability calculations sometimes yield rather counterintuitive results; we 
discuss here a well-known example of such a result. Suppose that we choose n ran- 
dom people, and then ask them their birthdays. What is the probability that at least 
two of the people have the same birthday? The probability depends upon the number 
n, and clearly the probability is larger for larger values of n. To guarantee that the 
probability is 1, which means that the desired outcome will definitely happen, we 
would need to have n > 366 (for simplicity, we will ignore leap years). What is the 
smallest value of n such that the probability of at least two people having the same 
birthday is 0.5, which means that it is a one in two likelihood? 

Suppose that 1 <n < 365. We then think of having n people, and we assign 
to each of these people a birthday. Such an assignment is the same as choosing n 
numbers from the set {1,...,365}, where order matters and where each number can 
be chosen more than once. We want to count how many such choices there are, 
and then out of those we want to see how many choices have at least two chosen 
numbers the same. By Part (1) of Counting Rules—Permutations there are 365” ways 
of assigning birthdays to n people. Of these 365” possibilities, we know by Part (2) 
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of Counting Rules—Permutations that there are Gal possible way of assigning 
birthdays to the n people such that no two people have the same birthday. It follows 
that there are 365” — Besa possible ways of assigning birthdays to the n people 
such that at least two people have the same birthday. Hence, the probability of having 
at least two people out of n people with the same birthday, denoted Py, is 


365" — Ges 365! 


P= — : 
365” 365”(365 —n)! 


We can compute these probabilities using a calculator, obtaining for example that 
P) + 0.0027 and P3 + 0.0082. There is no direct way of solving the inequality P, > 
0.5, but we can solve the problem by brute force by computing P4, Ps, and so on until 
we first find a value that is greater than or equal to 0.5. Such a calculation, which 
is easy with a computer, yields P:2 = 0.476 and P)3 © 0.507, which says that if 23 
people are randomly chosen, there is roughly a 50% chance that two people will 
have the same birthday, and that 23 is the minimum number of people needed. The 
number 23 is somewhat counterintuitive, given how much smaller it is than 365. 


To give a rigorous treatment of Counting Rules—Combinations, we look at the 
number of subsets of a given finite set, either subsets of a fixed size or of arbitrary 
size. We start with the following definition. 


Definition 7.7.8. Let A be a set, and let k € Z. Let ®(A) denote the family of all 
subsets of A with k elements, that is, the family 


(A) = {8 € P(A) | |S] =k}. A 


Example 7.7.9. Let A = {a,b,c}. We saw in Example 3.2.9 (2) that the subsets of 
A are 0, {a}, {b}, {c}. {a,b}, {a,c}, {b,c} and {a,b,c}. Then |#(A)| = 1, and 
|P,(A)| = 3, and |®(A)| = 3, and |®3(A)| = 1, and |®(A)| =Oifk<Oork>3. © 


The first part of the following theorem gives a formula for |®%(A)| when A is 
finite, and the second part of the theorem gives a formula for |®(A)| when A is finite, 
formalizing a fact that was stated without proof in Section 3.2. The proof of the 
second part of the theorem is much shorter than the first, because we can make use 
of something we proved about sets of functions in Section 4.5; the intuitive idea of 
the proof is that each subset of a given set can be specified by assigning each element 
of the set either 1 or 0, depending upon whether or not it is in the subset. 


Theorem 7.7.10 (Counting Rules—Combinations). Let A be a set. Suppose that A 
is finite. 


I. Letk € Z. The set (A) is finite. Ifk <Oork>|A 
O0<k< Al, then 


, then |®(A)| = 0. If 


|P(A)| = 


2. The set P(A) is finite, and |P(A)| = 241. 
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Proof. We prove the two parts of the theorem in reverse order. 


(2). We saw in Theorem 4.5.4 that P(A) ~ F(A,{0,1}), and it follows from 
Theorem 7.7.4 (1) that #(A) is finite. By Corollary 6.6.3 we see that |P(A)| = 
|¥(A, {0,1})|, and Theorem 7.7.4 (1) then implies that | F(A, {0,1})| = 2/41. 


(1). Regardless of the value of k, observe that (A) C P(A), and it then follows 
from Part (2) of this theorem and Theorem 6.6.5 (1) that (A) is finite. To compute 
|P.(A)|, we examine a number of cases. 

First, suppose that A = 0. Then £9(A) = {0} and ®(A) = @ when k 4 0. Hence 


|P(A)| = 1 = vo = ai. and |®,(A)| = 0 when k 4 0. Now assume that A 4 0. 

If k < 0, then it is clear that there are no subsets of A of order k, and hence 
®(A) = 0. Therefore |®,(A)| = 0. If k > |A], then it follows from Theorem 6.6.5 (3) 
that 2,(A) = 0, and hence |®(A)| = 0. If k = 0, then &(A) = {0}, and so |®%(A)| = 
l= ia. Now assume that 1 <k < |A|. 

Let E bea set with k elements. Let ~ be the relation on J(E,A) defined by f ~ g 
if and only if f(Z) = g(£), for all f,g € 1(E,A). It is straightforward to verify that 
~ is an equivalence relation; the details are left to the reader. We will prove the 
following two facts: (1) Each equivalence class of I(E,A) with respect to ~ has 
|E|! elements; and (2) the number of equivalence classes equals |®,(A)|. Once we 


prove these two claims, it will then follow from Corollary 7.6.12 that |/(E,A)| = 
|®(A)||E|!. By Theorem 7.7.4 (2) we know that |/(E,A)| = 4). and it will 


ai me (A|-[EN)!? 
follow that |®(A)| = ETAI=IED! = EUAIer- 


For each h € I(E,A), let h: E — h(E) be defined by h(x) = h(x) for all x € E; be- 
cause h is injective, then h is bijective. Let f € 1(E,A). Let ¥: [f] > B(f(E), f(E)) 
be defined by Y(g) = go f—! for all g € [f]. To see that the definition of ¥ 
makes sense, let g € [f]. Because g(E) = f(E), then ¥(g) is indeed a function 
f(E) > f(E). It follows from Exercise 4.4.13 (3) and Lemma 4.4.4 (3) that go f—! 
is bijective. Hence ¥(g) € B(f(E), f(E)), and we deduce that Y is well-defined. 
It is left to the reader in Exercise 7.7.20 to show that the function VY is bijective. It 
follows that [f] ~ B(f(E), f(Z)), and therefore by Corollary 6.6.3 we then deduce 
that |[f]| = |B(f(Z), f(Z))|. By Exercise 6.5.4 we see that |f(£)| = |E|, and hence 
by Theorem 7.7.4 (3) we know that |[f]| = |B(f(EZ), f(E))| = |E|!, which proves 
Fact (1). 

Let 1(E,A)/~ denote the set of equivalence classes of [(E,A) with respect to ~, 
as discussed in Section 5.3. Let ®: I(E,A)/~ — ®(A) be defined by ®([f]) = f(E) 
for all f € I(E,A). We leave it to the reader in Exercise 7.7.20 to show that the 
function @ is well-defined and bijective, which implies Fact (2). 


The reader might find the proof of Theorem 7.7.10 unsatisfying due to its re- 
liance on some heavy machinery involving sets of functions. Alternative proofs of 
the two parts of the theorem, using proof by induction (and for the first part of the 
theorem some of the properties of binomial coefficients given below), but without 
sets of functions and equivalence relations, are left to the reader in Exercise 7.7.22 
and Exercise 7.7.23. 
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Some texts use the notation 24 to denote (A), whether or not A is finite. This 
alternative notation might seem strange, but it does allow for the nice formula |24| = 
24| when A is a finite set. 

We have seen a number of problems involving counting in Section 7.6 and the 
present section. Given that most of these problems were not very tricky, the reader 
might have been led to think, mistakenly, that counting problems can usually be 
solved in a simple intuitive fashion, and that we have made a big deal out of nothing. 
As evidence that counting problems are not all straightforward, we present one final 
counting problem, known as the “hat check problem,” which is somewhat trickier 
than the problems we have seen until now, and which makes use of a number of the 
ideas we have learned. 


Example 7.7.11. Suppose that people check their hats at a theater. The hat check 
attendant accidentally loses all the stubs for the hats, and returns the hats at random. 
What is the probability that no one gets her own hat back? As discussed in Exam- 
ple 7.7.7 (3), this probability is the ratio of the number of ways that the hats can be 
returned so that no one gets her own hat back, denoted S(n), to the total number of 
ways that the hats can be randomly returned, denoted T(n). It is easy to compute 
T(n), because this is just the number of ways of arranging n things, where order 
matters. Therefore T(n) =n! by Theorem 7.7.4 (3). 

Computing S(7) is a bit trickier; it is the number of ways of arranging n things, 
where order matters, and where nothing stays where it started. Such a rearrangement 
is called a derangement in the combinatorics literature. We can reformulate our prob- 
lem in terms of functions, as follows. Let A be a set with n elements, where n € N. 
Then each derangement of n objects corresponds to a bijective function f: A >A 
such that f(a) £a for all a € A. To use standard terminology, a fixed point of a func- 
tion f: A — A is an element x € A such that f(x) = x. We are therefore interested in 
counting the number of bijective functions A — A with no fixed points, which we do 
in Theorem 7.7.12 following this example. The hard work for the hat check problem 
is in the proof of the theorem. From that theorem we see that 


: 2.4 | 
siny=ni (5 a a a): 


It follows that the probability that no one gets her own hat back is 
S(n) 1 1 1 1 
= a 1)” 
T(n) 2! 3! 4! av 


nl 


For example, with six people the probability is approximately 0.36667. It is inter- 
esting to observe that as n — ©, the probability goes to i, as can be seen using the 
power series for e* (consult any standard calculus text for this power series). 0) 


Theorem 7.7.12. Let A be a set. Suppose that A is finite and non-empty. Let F = 
{f € B(A,A) | f(a) 4a for alla € A}. Then 


i 41 1 
F|=|A!! (Al 
al (5 jg an) 
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Proof. Let a € A, and let Gy = {f € B(A,A) | f(a) = a}. Observe that if f € Gg, it 
might also be the case that f(b) = b for some b € A such that b # a. Let B € ®,(A), 
for some p € {1,...,n}. Then 


() Gp = {f € B(A,A) | f(b) = for all b € B}. 

beB 
It can be verified, similarly to Exercise 4.5.10 (3), that there is a bijective function 
from (),<-g Gp to B(A — B,A —B). Using Theorem 7.7.4 (3) and Theorem 6.6.5 (2) 
we deduce that |(,<, Gp| = |A — B|! = (|A|— |B])! = (|A|— p)!. 

Observe that F = B(A,A) — Ue, Ge. By Theorem 7.7.4 (3) we know that 

|B(A,A)| = |A|!. Using Theorem 6.6.5 (2), Exercise 7.6.12 and Theorem 7.7.10 (1) 
we then compute 


|F| = |B(A,A) — LU G| = |8(4,A)| —| U GI 


cEA cEA 
[Al 
=|2(4,4)+ EY (-)" V1 GI 
r=1 KE®,(A) keK 
IA 
=lAll+ PCD" Ye (lAl-n)! 
r=1 Ke®,(A) 
IA 
=|Al!+ ¥(-1)(/Al-r)! ) 1 
r=1 Ke®,(A) 
IA 
=|Al!+ )(-1)'(A| -)!12,4)| 
r=1 
IA 
: |A! 
= = ! 
A LI 1)'(|A| —r) la] —n) 
IA 
//Al! 
=|A '+ (1) ea 
— ia lal, Al caynil4l! 
fh ae |A|! 
(rr | 1 
=|Al! ih 1)/4I 
(5 31 A wh) |A|! 


Having defined the binomial coefficients in Definition 7.7.6, and having seen a 
few applications of them, we conclude this section by proving a few basic properties 
of these numbers. Some additional properties of the binomial coefficients can be 
found in the exercises for this section, and more properties than you ever wanted to 
know about the binomial coefficients (as well as some very clever arguments) are 
found in [GKP94, Chapter 5]. 


Theorem 7.7.13. Letn © NU {0}, and let k € Z. 
1. (a) =1, and (") = 1, and (7) =n, and focal =n. 
2. (a7) = (i): 
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3. ("i ) +1) = 
Proof. We will prove Part (3), leaving the rest to the reader in Exercise 7.7.14. 


(3). There are a number of cases, depending upon the value of k. If k < 0, then 
Co + aes =0+0=0= eye If k = 0, then by Part (1) of this proposition we 
see that (”",!) + ("—!) = 14+0=1= ("). Similar calculations show that the equation 

k k-1 k q 
holds ifk=nork>n.If1<k<n-—1, then 


n—1 n—1\ — (n—1)! (n—1)! 
(", )+ (oa te iNG—i—@—1)) 
(n—1)! %: (n—1)! 


k(k—1)!(n—k—1)! | (k—1)!(n—K)(n—k- 1)! 


=a (i): 


Part (3) of the above proposition leads to a convenient way of displaying and 
computing the binomial coefficients. Consider the following arrangement of the bi- 
nomial coefficients. 


Replacing the binomial coefficients with their numerical values, we obtain the fol- 
lowing triangle. 
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Observe that each entry in the triangle can be computed by adding the two entries 
above it in the previous row, which is equivalent to Theorem 7.7.13 (3). This fact 
allows for easy computation of binomial coefficients with small numbers. The left- 
right symmetry of the triangle is equivalent to Theorem 7.7.13 (2). This triangle of 
binomial coefficients is called Pascal’s triangle, though it was known in China earlier 
than Pascal’s time; see [Ifr85, p. 396] for the history of Pascal’s triangle, and see 
[HHP97, Chapter 6] for an interesting mathematical discussion of Pascal’s triangle 
and its extensions. 

The term “binomial coefficient” comes from the following very important theo- 
rem. 


Theorem 7.7.14 (Binomial Theorem). Let n € N, and let x,y € R. Then 


n n n 
(x+y)" = x24 @iaen ema (,",) pt deh 


Proof. The proof is by induction on n. When n = 1, then 


1 1 a ah ree 
(x+y)' =x+y= (j)so+ (1) =) (i) yi. 
i=0 


Now suppose that the result is true for some n € N. Making use of Theorem 7.7.13 
(1) (3) we compute 


NY n-i+l.i (M\ ni itt 
= _d|x + _d|x 
Eee) 


atl n : : 
eri, ‘ eo 
=I. _ 


i 


n+l 1 _ 
y ("* \ aaa 
mo \ 


Combining the Binomial Theorem with Pascal’s triangle, we see, for example, 
that (a+b)? = a> +5a‘%b+ 10a>b? + 10a7b? + Sab* + b>. 
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There are various formulas for sums of binomial coefficients, the simplest of 
which is given in the following proposition. Other sums may be found in Exer- 
cise 7.7.16 and Exercise 7.7.17, and even more complicated ones in [GKP94, Chap- 
ter 5]. We give three proofs of this proposition, in order of increasing pleasantness, 
to demonstrate a variety of the techniques we have learned. 


Theorem 7.7.15. Letn © NU {0}. Then 


y n\_ (n é: n z n aga n es n\ _ on 
Ar 0 1 2 n—-1 ae 
Proof. First Proof: This proof is by induction on n. If n = 0 then (3) =1=2°, and if 


n= | then (6) + (1) = 1+1=2=2!. Now suppose that the result holds for n € N. 
We then use Theorem 7.7.13 (3) and the inductive hypothesis to compute 


BC) EO) EOE) 
“E() +(e) 
“BO +h) 


—2"404042"—2.2" —2nt1, 


Second Proof: This proof uses Theorem 7.7.10, which interprets the binomial coeffi- 
cient as the numbers of subsets of appropriate size of a given set. Let A be a set with 


n elements. Then 
P(A) = (A) UP, (A) U---U®,(A). 


Moreover, the family of sets {P;(A) };__p is pairwise disjoint. Using Theorem 7.6.9 (2) 
we see that 
|P(A)| = |2o(A)| + |21(A)| +--+ |2,(A) 


? 


and therefore by Theorem 7.7.10 we deduce that 


v= (o)+()+-+() 


Third Proof: This proof makes use of the Binomial Theorem (Theorem 7.7.14). Be- 
cause that theorem holds for all values of x and y, it holds in particular when x = 1 
and y = 1. Substituting these values into the Binomial Theorem yields 


ra(+y"=) (") rainy ("). 


i=0 i=0 
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Exercises 


Exercise 7.7.1. The alphabet on planet Blort has 11 letters, which are divided into 
two types; there are 8 letters of type one, and 3 letters of type two. 


(1) How many different words can be made with these letters? 

(2) How many different words can be made with these letters if the words all 
have to start with a letter of type one? 

(3) How many different words can be made with these letters if the words are 
required to have all letters of the same type? 


Exercise 7.7.2. The license plates in a certain state have three letters followed by 
three numbers. 


(1) How many different license plates can be made? 
(2) How many different license plates can be made if the license plates all have 
to start with one of PU, FE or GA? 


Exercise 7.7.3. A group of eight brothers and sisters line up to get food at a family 
gathering. 


(1) How many different ways can they line up? 

(2) How many different ways can they line up if the oldest is at the head of the 
line and the youngest is at the end of the line? 

(3) How many different ways can they line up if the oldest and the youngest 
always stand together, with the oldest always ahead of the youngest? 


Exercise 7.7.4. You have five books in Esperanto and five books in Ugaritic, which 
you want to line up on a shelf. 


(1) How many different ways can you line the books up? 

(2) How many different ways can you line the books up if you put all the Es- 
peranto books on the left, and all the Ugaritic books on the right? 

(3) How many different ways can you line the books up if you alternate Esperanto 
and Ugaritic books? 


Exercise 7.7.5. A horse race has eight horses, and the first three places are an- 
nounced. Assume there are no ties. 


(1) How many possible outcomes are there for a single running of the race? 
(2) How many possible outcomes are there for two runnings of the race? 


Exercise 7.7.6. We want to select five distinct letters out of the word MUSHBRAIN 
and write them in a row. 


(1) How many different ways can this selection be done? 

(2) How many different ways can this selection be done if we write three conso- 
nants followed by two vowels? 

(3) How many different ways can this selection be done if we write four conso- 
nants followed by one vowel, or five consonants and no vowels? 
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Exercise 7.7.7. A company that solicits magazine subscriptions by phone sells 13 
different magazines. Given that any person they call might subscribe to anything 
from no magazines to all 13 of them, how many different possible responses could 
the company receive? 


Exercise 7.7.8. Susan has 15 shirts, from which she might or might not take any on 
an upcoming trip. 


(1) How many possible collections of shirts might she take? 
(2) How many possible collections of shirts might she take if she is definitely 
going to take at least two shirts? 


Exercise 7.7.9. Let X = {1,2,3,4}. Explicitly list the elements of each of the sets 
Py(X), P(X), Po(X), P3(X) and P(X). 


Exercise 7.7.10. Xavier has six pairs of cotton pants and four pairs of wool pants. 
He needs to take five pairs of pants on a trip. 


(1) How many possible choices can Xavier make? 
(2) How many possible choices can Xavier make if he is to take three pairs of 
cotton pants and two pairs of wool pants? 


Exercise 7.7.11. The Al Jolson fan club of Flugletown has eight men and five 
women, including Mr. and Ms. Atiyah-Singer. The club want to pick a steering com- 
mittee. 


(1) How many possible five-person committees can be formed? 

(2) How many possible four- or five-person committees can be formed? 

(3) How many possible four-person committees can be formed if there must be 
two men and two women on the committee? 

(4) How many possible four-person committees can be formed if there must be 
two men and two women on the committee, and not both Mr. and Ms. Atiyah- 
Singer are allowed to be on the committee at the same time? 


Exercise 7.7.12. You choose three cards from a deck of cards. Find the probability 
of drawing each of the following options. 


(1) Three red cards. (4) Two Queens and one Jack. 
(2) A face card. (5) Three cards of the same suit. 
(3) Three Aces. (6) Three Aces or Three Kings. 


Exercise 7.7.13. Expand the following expressions. 
(1) (a+36)°. (2) (2x++)’, 


Exercise 7.7.14. [Used in Proposition 7.7.13.] Prove Theorem 7.7.13 (1) (2). 


Exercise 7.7.15. Let n,s € NU {0}, and let k,s € Z. Prove that the following formulas 
hold. 
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) ({) = E(fo1)» when k # 0. (3) () + (°3') =2°. 
(2) (") (Q) = (2) (Cop), when k <n. (4) ("37)-(@) =’. 


Exercise 7.7.16. [Used in Section 7.7.] Let n,s € NU {0}. Prove that the following 
formulas hold. 


OFC H=Co") 2) Hao = Ci). 
Exercise 7.7.17. [Used in Section 7.7.] Let n € N. 
(1) Prove that Yf_y(—1)*(Z) =0. 


(2) Prove that 
y n _ n 
k even (:) », (:) 
O0<k<n O0<k<n 


(3) Let A be a non-empty set. Suppose that A is finite. Let P(A), respectively 
o(A), denote the family of all subsets of A with an even, respectively odd, 
number of elements. Prove that |®z(A)| = |0(A)]. 


Exercise 7.7.18. Certain diagonals in Pascal’s triangle are indicated in Figure 7.7.1. 
Make a conjecture for a formula for the sums of the entries along these diagonals; 
state your formula in terms of binomial coefficients. Prove your conjecture. Recall 
the Fibonacci numbers discussed in Section 6.4. 


B 


Fig. 7.7.1. 


Exercise 7.7.19. [Used in Theorem 7.7.4.] Prove Theorem 7.7.4 (2). 


Exercise 7.7.20. [Used in Theorem 7.7.10.] Let Y and ® be the functions defined in 
the proof of Theorem 7.7.10. Prove that function ® is well-defined, and that both 
and Y are bijective. 
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Exercise 7.7.21. Let A and B be sets. Prove that A ~ B implies that ®(A) ~ ®(B), 
for all k € NU {0}. Observe that Theorem 7.7.10 cannot be used here, because A and 
B are not required to be finite. 


Exercise 7.7.22. [Used in Section 7.7.] Prove Theorem 7.7.10 (1) directly by induc- 
tion on |A|, without using sets of functions. Only the case 0 < k < |A| needs to be 
treated. Use Exercise 7.7.21. 


Exercise 7.7.23. [Used in Section 7.7.] Prove Theorem 7.7.10 (2) directly by induc- 
tion on |A|, without making use of Theorem 4.5.4. 


Exercise 7.7.24. Let A and B be sets, and let f: A — B be a function. Suppose that 
f that has a left inverse but not a right inverse. Suppose that A and B — f(A) are both 
finite sets. How many left inverse does f have? Prove your answer. 


7.8 Limits of Sequences 


In the previous sections of this chapter we saw various topics that had an algebraic 
flavor. In the present section, by contrast, where we discuss limits of sequences, 
the material is from analysis. See any introductory real analysis text, for example 
[Blol1, Chapter 8], for a detailed treatment of limits of sequences. 

The proofs in this section, while not longer than those in the previous sections 
of this chapter, are very different from what we have seen so far, and are often con- 
sidered a bit trickier upon first encounter. The source of this trickiness is the double 
quantifier in the definition of limits of sequences, as we will soon see. Careful at- 
tention to quantifiers is important in the formulation of all proofs, and is even more 
important here. 

Before proceeding to the topic of this section, we note that in addition to all the 
basic algebraic properties of the real numbers that we have used throughout this text, 
we need for the present section two additional properties, which are stated as Theo- 
rem A.2 and Theorem A.3 in the Appendix. The reader, who should review these two 
theorems before proceeding, is most likely familiar with these facts informally, but 
we need to be quite explicit in their use if we want rigorous proofs about sequences. 

In our discussion of limits of sequences we will use the following phraseology. 
We will regularly need to select an arbitrary positive real number, often denoted €. 
Formally, we should write “let € € IR, and suppose that € > 0.” However, in order to 
avoid that cumbersome formulation, we will stick with the standard, albeit not quite 
precise, phrase “let € > 0.” 

We have already seen sequences in a few places in this text. In Example 4.5.2 (4) 
we defined a sequence of real numbers formally as a function f: N — R. Informally, 
we write a sequence as C1,C2,C3,..., where we could convert this informal notation 
to the formal approach by defining a function g: N — R by letting g(1) = cj), and 
g(2) =cz, and so on. For convenience, we summarize this definition of sequences as 
follows. 
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Definition 7.8.1. A sequence of real numbers (also called a sequence in R) is a 
function f: N — R. If f: N — R is a sequence, and if c; = f(i) for all i € N, then 
we will write either c),c2,c3,... or {cn}; , to denote the sequence. Each number cp, 
forn EN, is called a term of the sequence {c,}~ A 


n=l" 

It is important to distinguish between the concept of a “sequence” and the related, 
but not identical, concept of a “series.” In non-mathematical usage these two words 
are often used interchangeably, but not in mathematical terminology. Intuitively, a 
sequence of real numbers is a collection of numbers of which there is a first, a second, 
a third and so on, with one real number for each element of N. For example, 


1 1 1 1 
42” 22° 32° 42’ oe 
is a sequence. By contrast, a series is a “sum” of the terms in a sequence, for example 


a a ee 
Ge tages 


The word “sum” is in quotes because, although we can write such an infinite sum, 
it is not clear whether such a sum actually adds up to a finite amount (this par- 
ticular example does). We will not discuss series here, though we note that they 
are a very important topic in real analysis and in applications of mathematics; see 
[Blol1, Chapter 9] for basic information about series. 

In addition to seeing the formal definition of sequences in Section 4.5, we also 
saw the use of Definition by Recursion to define sequences in Section 6.4. For ex- 
ample, we used Definition by Recursion to define the Fibonacci sequence, which 
starts 


1,1,2,3,5,8, 13,21,34,55,89, 144... 


What concerns us at present is not how sequences are defined, but what happens to 
the terms of a sequence {c,};"_, as n gets larger and larger. Rather than repeatedly 
saying the cumbersome phrase “n gets larger and larger,’ we will use the slightly 
shorter, and very standard, phrase “n goes to 0,” which we write with the notation 
“n — oo.” Keep in mind that there is no real number “co,” and that this symbol is 
simply shorthand for allowing us to take larger and larger numbers without bound. 
For example, if we look at the terms of the Fibonacci sequence, they clearly grow 
without bound as n goes to oo. On the other hand, in Exercise 6.4.14 we looked at the 
sequence of the successive ratios of Fibonacci numbers, that is, the numbers 


1235 8 13 


In that exercise, it was noted that as n goes to ©, the terms in this sequence get closer 
and closer to the number 1.618. ... The reader is urged to check the plausibility of this 
claim by calculating the values of the first few terms of this sequence in decimals; a 
proof of the claim, which is not important to us here, is discussed in Exercise 6.4.14. 

What concerns us in this section is the general notion of the terms of a sequence 
getting closer and closer to a number as n goes to oo. Given a sequence of real num- 
bers {c,},_, we want to verify whether or not there is a real number L such that the 
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value of c, gets closer and closer to L as the value of n goes to oo. If such a number 
exists, we call it the limit of {c,}*,. Not every sequence has a limit, for example 
the Fibonacci sequence. 

The intuitive idea of the limit of a sequence is not hard to understand. For ex- 
ample, it seems clear intuitively that the sequence {1 has a limit, which is 0. 
However, it is not at all trivial to formulate this intuitive notion in a rigorous way that 
allows us to formulate proofs of properties of limits of sequences. 

If we look at the formulation “the value of c, gets closer and closer to a number 
L as the value of n goes to -,” we see that there are two parts that need to be made 
precise, namely, the part about c, getting closer and closer to L, and the part about 
n going to oo. The key idea is to reformulate the notion of getting closer and closer 
to something by using a numerical measure of closeness, which is done by a number 
often denoted €. We then say that the limit of {c,}°, is L if, for every possible 
choice of € > 0, no matter how small, the value of c, will eventually stay within 
distance € of L as n goes to ©. In other words, the limit of {c,};"_, is L if for every 
€ > 0 we can show that for all sufficiently large values of n, the value of c, will be 
within distance € of L. We will use N € N to denote the measure of largeness of n. 
We then say that the limit of {c,}"_, is L if for every € > 0 we can show that there 
is some N € N such that for all at least as large as N, the number cy, will be within 
€ distance of L. To say that c, is within distance € of L is to say that |c, — L| < €. 
We then see that the rigorous way to say “the value of c, gets closer and closer to a 
number L as the value of n goes to -” is to say that for every € > 0, there is some 
N EN such that for all n € N such that n > N, it is the case that |c, —L| < €. 


Definition 7.8.2. Let {c,};, be a sequence in R, and let L € R. The sequence 
{cn}, converges to L if for each € > 0, there is some N € N such that n € N and 
n> N imply |c, —L| < €. If the sequence {c,},_, converges to L, then L is the limit 
of {cn}, and we write 

lim c, = L. 


n—-oo 


If {c,};_, converges to some real number, the sequence {c,};"_; is convergent; 
. re co . ° i 
otherwise the sequence {c,};"_, is divergent. A 


If the reader finds Definition 7.8.2 a bit hard to follow upon first, or second, en- 
counter, the reader is in good company. This definition is indeed trickier than any 
definition we have seen in this text, and it often takes some practice to attain a rea- 
sonble level of comfort with this definition. Moreover, we are just skimming the 
surface in our discussion of the limits of sequences in this section, and substantial 
practice with this, and similar, concepts awaits the reader who takes a course in real 
analysis. 

The use of quantifiers in general was discussed in Section 1.5, and the use of 
quantifiers in proofs was discussed in Section 2.5. In both those sections it was men- 
tioned that when a statement has more than one quantifier, the order of the quantifiers 
matters. A classic example of the importance of the order of quantifiers can be seen 
in Definition 7.8.2. This definition could be written in logical symbols as 


(Ve > O)(AN EN)[(1n € NAn SQN) = [cn -L| < €]. 
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The order of the quantifiers in this statement cannot be changed, as the reader may 
verify by thinking about what the statement would mean if the order of the quantifiers 
were reversed. 

As we saw in our discussion of proofs involving quantifiers in Section 2.5, when 
we want to prove a statement with more than one quantifier, we take one quantifier 
at a time, in the given order, from the outside in. Suppose that we want to prove that 
im. Cn = L, for some sequence {c,};; and some L € R. The first quantifier that we 


need to deal with is Ve > 0, which is an abbreviated way of writing Ve € (0,°°). To 
prove a statement with the universal quantifier Ve > 0, we must choose an arbitrary 
€ > 0, and then prove the result for that €. From this point on in the proof, the 
arbitrary € is fixed, and cannot be changed. Next, we need to deal with the quantifier 
AN €N. To prove a statement with this existential quantifier, we simply need to 
produce a value of N, and then show that it works. How we find the value of N is 
part of our scratch work, but is not part of the actual proof. The value of N may 
depend upon € and upon the sequence {c,}_;. Once N has been found, we then 
need to prove (n € NAn > N) = |c, —L| < €. To prove such an implication, we 
assume n € NAn > N, and we need to deduce that |c,, — L] < €. Hence, we proceed 
by choosing an arbitrary n € N such that n > N. We then use whatever argumentation 
is needed to deduce that |c, — L| < €. It is important in such a proof that the arbitrary 
choices of € and n are indeed arbitrary. Putting the above ideas together, we see that 
this type of proof typically has the following form. 


Proof. Let € >0. 
ierpumentation) 
ae ae ieee 
ieipammentatont 
Let ‘ee N, and suppose that n > N. 


(argumentation) 


Therefore |c, — L| < €. 


The above type of proof that lim c, = L is often called an “e-N” proof. 
n—-oo 


For our first proof involving limits of sequences, observe that in Definition 7.8.2 
it is not stated that the number “L” in the definition is unique. Of course, if the limit 
of a sequence were not unique, we could not properly speak of “the limit” but rather 
only of “a limit,” and such a situation would be quite contrary to our intuitive notion 
of limits. Fortunately, as we see in the following lemma, it is indeed the case that 


316 7 Selected Topics 


if a sequence has a limit, then there is a single number L that c, is getting closer 
and closer to as n goes to o. In Section 2.5, we discussed proofs of existence and 
uniqueness. In the following lemma we prove only uniqueness but not existence, 
because not every sequence has a limit, though if a limit exists, then it is unique. 


Lemma 7.8.3. Let {cn}; be a sequence in R. Then there is at most one L € R such 
that lim cy = L. 


n—-oo 


Proof. If there is no L € R such that lim c, = L, then there is nothing to prove. 


n—-oo 


Now suppose that there are L;,L. € R such that L; 4 Ly, and that lim c, = L; and 
n—-oo 


lim cy, = Ly. Let € = ie Then € > 0. Hence by Definition 7.8.2 there is some 


n—-eoo 


N, € N such that n € N and n > N, imply |c, —Li| < €, and there is some N2 € N 
that n € N and n > Np imply |c, — L2| < €. Let N = max{N,,N2}. Then N > N, and 
N > N>. We now use the Triangle Inequality (Theorem A.2 (1)) to compute 


|Ly —Lo| = |Li — cw + ew — L| < [Li — cw|+ len — Lo| 
|Li — La| 


= |cy —L1|+ |cey —Lo| <e +€ = 2€ =2 5 


= |L; —L)|. 
We have reached a contradiction, and it follows that there is at most one L € R such 
that lim c, = L. 


n—-oo 


Because of Lemma 7.8.3, we can refer to “the” limit of a sequence, if the limit 
exists. 

Now that we have established the uniqueness of limits of sequences when they 
exist, we turn to a few examples. In the first three parts of this example we will do 
scratch work prior to the actual proof. As the reader will observe, particularly in 
Part (3) of the example, it can happen that the scratch work and the actual proof look 
quite different from each other. 


Example 7.8.4. 


(1) Let k € R. We will prove that the constant sequence k,k,k,k,... is convergent, 
and that its limit is k. We can write this constant sequence as te) ay where cy, =k 
for alln € N. We will prove that lim c, =k, which could also be written as lim k=k. 

n—0oo 


n—eoo 


Scratch Work. We will work backwards for our scratch work. We want to find N € N 
such that n € N and n > N imply |c, —k| < €, which is the same as 0 < €, and that is 
always true. Hence any N € N will work, and we will arbitrarily choose N = 1. 


Actual Proof. Let € > 0. Let N = 1. Let n € N, and suppose that n > N. Then 
|cn —k| = |k—k| =0 < €. Hence limc, =k. 
n—-eoo 


(2) We will prove that lim 4+ = 0. 
n—-oo 


Scratch Work. Again, we will work backwards for our scratch work. We want to 
find N € N such that n € N andn > N imply |+ —0| < €, which is the same as t <é. 
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Hence, we need some N € N such that + < €, which means that we need to choose 


some N € N such that N > 7 It is intuitively evident that such N can always be 
found, and formally we can find such N by Theorem A.3. 


Actual Proof. Let € > 0. By Theorem A.3 there is some N € N such that i <N. 
Then ¢ < €. Letn € N, and suppose that n > N. Then 


1 ol = 1} 1 Z 1 Sy 
n —|Int nN ; 
Therefore lim I =0. 
: Z nz 
(3) We will prove that Jim, 243 = 2. 


Scratch Work. This example is trickier than the previous one. We want to find N « N 
such that n € N and n > N imply 


Qn? 
2I<€ 
n? +3 | 
which is the same as 
—6 
<& 
n+3 | , 
which is equivalent to 
e eet of 
n> +3 , 
which in turn is the same as 
6 2 
—<n°+3. 
& 


Solving for n we obtain 


6 
n>yfS—s. 
E 


Unfortunately, this last inequality has a problem when the expression inside the 
square root is negative, and we will therefore need to consider two cases. First, sup- 
pose that 8 > 3. Then € < 2. In this case case we can use any choice of N such 


that 
a Oe 
E 


Second, suppose that 8 < 3. Then € > 2. In this case we cannot use ,/ 8 — 3, but 
fortunately it turns out that we do not need to. Instead, we observe that if n € N, then 


6 2 6 
n2+3 > 0243 — 


2<€. 


Hence, in this case any choice of N will work, so we choose N = 1. 
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Actual Proof. Let € > 0. There are two cases. First, suppose that € > 2. Let N = 1. 
Let n € N, and suppose that n > N. Then 
—6 
n2+3| n?+3 


In? 
a <2<€. 


—2 
n24+3 


Second, suppose that € < 2. Then 8 —3 > 0. By Theorem A.3 there is some N € N 


such that 
6 
N>¥4/--3. 
€ 


Let n € N, and suppose that n > N. Then 


6 
n>f—s. 
E 


Some rearranging yields 


6 
n243 oe 
and therefore 
2n? 6 
+3 = a i 


: i 2ne 
Putting the two cases together proves that iim Za = 2 


(4) We will prove that the sequence 1,0,1,0,... is divergent. We can write this 
sequence as {c,}_, where 


1, ifnis odd 
C= : : 
0, ifn is even. 


_1\yntl 
(It is possible to avoid the two cases in the above equation by writing cp, = oy 
for all n € N, but doing so, while shorter, makes the proof less clear, and brevity is 


never worthwhile at the expense of clarity.) Suppose to the contrary that lim c, = L 
n—-oo 


for some L € R. Let € = 5. Then there is some N € N such that n € N andn > N 
imply |c, —L| < 5. Let nj,n2 € N, and suppose that nj > N and n is odd, and that 
nz = N and nz is even. Using the Triangle Inequality (Theorem A.2 (1)) we compute 


1 = |1—0| = len, —¢ny| = |e, —-L+L—eny| 


1 1 
Sey L sr b= Gel = as =1. 
We have reached a contradiction, and it follows that a an is divergent. © 


Limits of sequences involve what happens to the terms of the sequence as n goes 
to co, It therefore makes sense intuitively that if finitely many terms of a sequence 
are changed, it does not affect whether or not the sequence is convergent, and if the 
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sequence is convergent, it does not change what the limit of the sequence is. A proof 
of this fact is given in Exercise 7.8.3. 

We now prove three typical, and useful, theorems about sequences. There are, of 
course, many more such theorems that can be proved about sequences, but we have 
space only for these three. 

Our first theorem, which will be used in the proof of the following theorem, states 
that if a sequence is convergent, then the terms of the sequence cannot become too 
large in absolute value. 


Theorem 7.8.5. Let {cn}; _, be a sequence in R. If {cn}, _, is convergent, then there 
is some B ER such that |cy| < B for alln EN. 


Proof. Suppose that {c,};_; is convergent. Then there is some L € R such that 
lim c, = L. Hence there is some N € N such that n € N andn > N imply |c, —L| < 1. 


n—-oo 


It follows from Theorem A.2 (2) that n € N and n > N imply |c,| — |L| < 1, and 
therefore |c,| < |L|+ 1. Let 


B=max{|cj|,|co|,...,|ew—1},|L| +1}. 


We then see that |c;| < B for all k € N. 


The converse of Theorem 7.8.5 is not true. For example, the sequence 1,0,1,0,... 
satisfies the conclusion of the theorem, but we saw in Example 7.8.4 (4) that it is 
divergent. If a sequence satisfies the conclusion of Theorem 7.8.5, it is customary to 
say that the sequence is “bounded,” though we will not need that terminology here. 

Observe that in Theorem 7.8.5, it is always possible to choose the number B so 
that B > 0. 

Our next theorem shows that limits of sequences behave nicely with respect to 
term-by-term addition, subtraction, multiplication and division of sequences. 


Theorem 7.8.6. Let {cn}; and {dn}; be sequences in R, and let k € R. Suppose 
that {cn},_, and {dy},_, are convergent. 


I. The sequence {cn+dn};_, is convergent, and lim [cy + dn} = lim cy + 
noo n—oo 
lim d,. 
n—-oo 
2. The sequence {Cy—dn},_, is convergent, and lim [en — dy] = lim cy — 
n—co n—co 
ee 


n—oo 


3. The sequence {kc,},_, is convergent, and lim kcy = klim cp. 
n—co 


n—-eoo 


4. The sequence {cydy},,_, is convergent, and lim c,d, = {lim cy] - {lim dy}. 
a n—oo n—co n—0o 
3. If iim dn, # 0, then the sequence ‘ Fh is convergent, and im a= 
lim Cn 
n— oo 
lim d, ° 
n—-o0o 


Proof. We will prove Parts (1) and (4), leaving the rest to the reader in Exer- 
cise 7.8.13. 
Let L= lim cy, and M = lim dy. 


n—oo n—-oo 
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(1). Let € > 0. Then there is some N; € N such that n € N and n > N, imply 
|cn —L| < 5, and there is some N2 € N such that n € N and n > N) imply |d, —M| < §. 
Let N = max{N,,N>}. Let n € N, and suppose that n > N. The Triangle Inequality 
(Theorem A.2 (1)) now implies 


\(¢n-+ dn) — (L+M)| = |(Cn—L) + (dn—M)| S$ len —L| + ldn — MI <5 +5 = 
(4). Let € > 0. By Theorem 7.8.5 there is some B € R such that |d,| < B for all 
n€ N. We may assume that B > 0. Therefore B 7 |L| > 0. Then there is some N; € N 
such that n € N and n > N, imply |cp — A < Bue and there is some N2 € N such 
that n € N and n > Np imply |d,-—M| < Bg: Let N = max{N,N2}. Let n € N, and 
suppose that n > N. Using the Triangle Inequality and Exercise 2.4.9 (4) we see that 


\Cndn — LM| = |Cndn — dnpL + d,L — LM| < |\dn(cn —L)| + |L(dn — M)| 
E E 
+ |L|- =€. 
B+|L| B+|L| 
It is important to recognize that in Theorem 7.8.6, it is necessary to assume that 
{cn}, and {d,,};°_, are convergent. If the sequences are not convergent, then lim c, 
n—co 


= \dn|- |Cn —L| +|L|-|d, —M| <B- 


and lim d, do not exist, and so it would make no sense to write expressions such as 
n—-oo 
Jim Cn + jim n dn. ” Moreover, it can happen that {c,};_, and {d,};"_, are divergent, 


aa yet os + ane _, is convergent, in which case it would not be plausible to expect 
to express tim n (Cn + dn) in terms of the non-existent limits of {ent 1 ond {dy} 13 


the reader i is ; asked to supply an example of such sequences in Exercise 7.8.6. 
A simple use of Theorem 7.8.6 is seen in the following example. 


Example 7.8.7. We saw in Example 7.8.4 (3) that jim ; cue = 2. The proof in that 


example used the definition of limits of sequences directly. N ow that we have Theo- 
rem 7.8.6, we can give a much easier proof _ this limit holds. 
We know by Example 7.8.4 (2) that lim ;, — —(); that €-N proof was much simpler 


than the €-N proof in Example 7.8.4 (3). Ter now follows from Theorem 7.8.6 (4) that 


noo 12 noon Nn no 7} no 


ii ie doe tim -| . im -| =0-0=0. 


We then use Theorem 7.8.6 (1) (3) (5) to compute 


2n2 ; 2 2 
lim 


= lim r= = 2: © 
noo n2 +3 BAS V3 14+3-0 


Our last theorem shows that limits of sequences behave nicely with respect to the 
relation <. 


Theorem 7.8.8. Let {c,};, and {d,},-_, be sequences in R. Suppose that there is 
some N €N such thatn € N andn > N imply cn < dp. If {cn}, and {dn}y_, are 


convergent, then lim cy, < lim dy. 
n—-oo n—-oo 
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Proof. Suppose that {c,}*, and ae , are convergent. Let L = Jim Cn and M = 
iim d,. Suppose that M < L. Let e = 4“. Then € > 0. Hence thisies is some N; € N 


such that n € N and n > N, imply |cp — yy < €, and there is some N> € N such that 
n€Nandn > Np imply |d, —M| < €. Let P = max{N,N,,N2}. Then |cp —L| < € 
and |dp — M| < €. It follows that L—e <cp <L+e and M—€ <dp<M-+e, and 
hence 

L-M _L+M L—-M 


d M+e=M4 = =L =L-€ : 
os 2 2 2 ve 


This last inequality contradicts the fact that c, < d, for all n € N such that n > N. 
Therefore L < M. 


We note that that it is not possible to replace < with < throughout the state- 
ment of Theorem 7.8.8; the reader is asked to supply an example to show why in 
Exercise 7.8.7. 


Exercises 


Exercise 7.8.1. Using only the definition of limits of sequences, prove that each of 
the following statements is true. 


(1) jim 5-2 7 = 0. 
(2) lim nag = =0. 
(3) tin ntt=1] 


8) lim 3 =0 
(5) {n};_, is divergent. 
Exercise 7.8.2. Prove that {2n}*"_, is divergent, in each of the following two ways. 


(1) Use only the definition of limits of sequences. 
(2) Use Theorem 7.8.5. 


Exercise 7.8.3. [Used in Section 7.8.] Let {c,};, and {d,}>_, be sequences in R. 
Suppose that there is some N € N such that n € N andn > N imply c, = d;. Prove that 
{cn},,_1 is convergent if and only if {d,};_, is convergent, and if they are convergent 
then im — jim Aji 

Exercise 7.8.4. Let {c,},-_, be a sequence in R, and let r € N. Let {d,}"_, be the 
sequence defined by d, = cn+, for all n € R. Prove that conan is convergent if and 
only if {dn}y_ , 18 convergent, and if they are convergent then jim j= jim dy. 


Exercise 7.8.5. Let {c,};, be a sequence in R, and let L € R. Prove that lim cy, = L 
if and only if lim [c, —L] = 0. 
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Exercise 7.8.6. [Used in Section 7.8.] Find an example of sequences {c,};, and 
{dn}; in R such that {c,}>°_, and {d,}/_, are divergent, but {c,-+d,}7_, is con- 
vergent. 


Exercise 7.8.7. [Used in Section 7.8.] Find an example of sequences {c,};"_, and 
{dn},,—; in R such that cy < d, for all n € N, and that lim cy, = lim dp. 


Exercise 7.8.8. Find an example of sequences {c,},, and {d,}>_, in R such that 

lim cy = 0, but {cndy};,_1 is divergent. 

n—-co 

Exercise 7.8.9. Let {cn}, and {d,},,_, be sequences in R. Suppose that lim cy, = 

0, and that there is some D € R such that |d,| < D for alln € N. Prove that lim cyd, = 
n—oo 

0. 


Exercise 7.8.10. Let {c,};_, be a sequence in R. Suppose that {c,}", is conver- 


n=1 


gent, and that lim c, > 0. Prove that there is some D > 0 and some N € N such that 
n—oo 
n€Nandn>N imply c, > D. 


Exercise 7.8.11. Let {c,};_, and {d,};_, be sequences in R. Suppose that {cn}; 
and {d,};"_, are convergent, and that lim c, = lim dy. Prove that {min{cy,dn}}>_ | 


is convergent and lim min{c,,d,} = lim cy, = lim dy. 
n—-oo n—-oo n—-o 
Exercise 7.8.12. Let {c,};_, and {d,};°_; be sequences in R, and let L € R. Suppose 
that lim d,, = 0, and that there is some N € N such that n € N and n > N imply 
n—0o 
\Cn —L| < dy. Prove that lim c, = L. 
n—-oo 


Exercise 7.8.13. [Used in Theorem 7.8.6.] Prove Theorem 7.8.6 (2) (3) (5). 


Exercise 7.8.14. Let {c,};_, and {dn}, be sequences in R, and let k € R. Suppose 
that {c,};"_, is divergent and {d,};"_, is convergent. 


(1) Prove that {cy +dn}/_, is divergent. 
(2) Prove that {c, —d,};-_, is divergent. 
(3) Prove that {kc,}>, is divergent. 


Exercise 7.8.15. Let {c,};, be a sequence in R. 
(1) Let LER. Prove that if lim c, = L, then lim |c,| = |Z]. 
n—0oo n—oo 
(2) Prove that lim c, = 0 if and only if lim |c,| =0. 


(3) Give an example of a sequence {d,};_, in R such that {|d,|}*"_, is conver- 
gent, but {d,},,_, is divergent. 
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Explorations 


The imagination in a mathematician who creates makes no less difference 
than in a poet who invents. 
— Jean d’Alembert (1717-1783) 


8.1 Introduction 


We now turn things over to the reader. The goal of this book is for the student to learn 
how to do mathematics as mathematicians currently do it. The ideas we have covered, 
such as proofs, sets, functions and relations, are in the tool bag of any working math- 
ematician. There is, however, one aspect of mathematics that we have not seen until 
now. So far the reader has been learning the material from the text, using exercises 
as practice. The material in the text is presented in a straight path, going one step 
forward at a time. The exercises, as the reader may assume, can all be solved using 
the material that was discussed until that point. Mathematical research, by contrast, 
is not so straightforward. 

Research in mathematics involves discovering—and then proving —new theo- 
rems. Contrary to popular misconception, mathematics has not been “all figured out.” 
Indeed, more new mathematics is being discovered today than at any other period in 
history. What makes research so exciting is precisely that there is no text, and no 
clear path, to follow. The researcher has to try examples, develop an intuitive feeling 
for what is going on, formulate proposed definitions, try to prove theorems using 
these definitions, go back to the drawing board if things do not work out and so on. 
This process can be tiring and frustrating, and often involves attempts that turn out 
to lead nowhere, but for the sake of those times when the ideas do come together in 
the proof of a new theorem, it is well worth it. 

At the level of this text, it is not possible to do any research in the sense of being 
at the cutting edge of some branch of mathematics. Nonetheless, we can attempt to 
create the feeling of mathematics research for the reader by providing some open- 
ended topics to be explored. These topics are all known to mathematicians, but we 
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assume that the reader has not seen them. For each of these topics, we give a few 
definitions, and raise a few questions, and leave the rest to the reader’s imagination. 

The reader should pick a topic, and then play with it. Formulate conjectures, 
make whatever extra definitions are necessary, try to come up with theorems and 
proofs and write up the results of this exploration as if they were meant to be an 
additional section for this book. The writing should include definitions, examples, 
lemmas, theorems, proofs and informal discussion. The target audience for this ex- 
position should be the other students who have seen the material from this book, 
but nothing beyond it (and in particular, have not looked at the topic under con- 
sideration). The reader should not look the topic up in other books until her own 
exploration and writing is complete—the point is not to see what else is known, but 
to explore as much as possible on one’s own. 

Rather than choosing one of the topics suggested below, the reader could try to 
come up with a topic of her own to explore. Doing so is a good way to ensure that 
the topic will be enjoyable, but it is also risky, because some proposed avenues of 
exploration may not lead anywhere, and others may be too difficult. Consult with the 
instructor of the course about any such ideas. 


8.2 Greatest Common Divisors 


A standard construction that is taught in elementary school is to find the greatest 
common divisor of two integers. For example, the greatest common divisor of 12 
and 16 is 4. Greatest common divisors are useful not only in school mathematics, but 
also in advanced topics such as number theory. Recall the definition of an integer a 
dividing an integer b, denoted a|b, given in Definition 2.2.1. 


Definition 8.2.1. Let a,b © Z. If at least one of a or b is not zero, the greatest 
common divisor of a and b, denoted (a,b), is the largest integer that divides both a 
and b. If a=0 and b = 0, let (0,0) =0. A 


For example, we see that (27,36) = 9. The notation (a,b) for the greatest com- 
mon divisor of a and b is somewhat unfortunate, because the same notation can also 
mean an ordered pair or an open bounded interval in the real numbers, but it is quite 
standard, and rarely causes confusion when read in context. 

Before proceeding, we need the following lemma. 


Lemma 8.2.2. Let a,b € Z. Then (a,b) exists, and (a,b) > 0. 


Proof. If a=0 and b= 0 then (a,b) exists by definition, and (a,b) > 0. Now suppose 
that at least one of a or b is not zero. Let 


S={d€EN|dlaand d|b, andd > 0}. 


Observe that the set S is non-empty, because it contains 1, and that if x € S, then x is 
less than or equal to the smaller of a and b. It follows from Exercise 6.6.2 that S$ is 
finite. By Exercise 6.6.5 there exists some k € S such that p <k for all p € S. Because 
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any divisor of a and b that is not in S is negative, and is therefore less than k, then k 
is the greatest common divisor of a and b. Hence (a,b) exists, and (a,b) > 0. 


The following related definition is useful in the study of greatest common divi- 
sors. 


Definition 8.2.3. Let a,b € Z. The numbers a and b are relatively prime if (a,b) = 
1. A 


For example, the numbers 15 and 28 are relatively prime. 

It is possible to prove many results about greatest common divisors, some simple 
and some more substantial. A typical simple result is the following proposition. It 
might appear at first glance that the proposition is entirely trivial, if one thinks about 
greatest common divisors in terms of factoring all the relevant integers into unique 
prime factors. This fact, known as the Fundamental Theorem of Arithmetic, is a 
substantial result that we have not proved, and so it should not be used here. All 
results about greatest common divisors that the reader considers should be proved 
using only what we have proved in this book. 


Proposition 8.2.4. Let a,b € Z. If d = (a,b) is not zero, then (§, 5) =1. 


Proof. Observe that § and 5 are integers. Let r € Z. Suppose that r|§ and r| B Then 
there are m,n € Z such that rm = § and rn = a Then a = rmd and b = rnd. Hence 
(rd)\a and (rd) |b. Because d is the largest integer that divides a and J, it follows that 
rd < d. Using the fact that d > 0, we deduce that r < 1. Because | divides both § 


b a by _ 
and 5, we see that ($, 5) =1. 


A look at some examples of greatest common divisors shows that the greatest 
common divisor of any two integers a and b is not only greater than all other integers 
that divide a and b, but in fact is divisible by every integer that divides a and b. This 
fact is stated in the following theorem. 


Theorem 8.2.5. Let a,b,p © Z. If p\a and p 


b, then p|(a,b). 
Theorem 8.2.5 follows from the next result. 
Theorem 8.2.6. Let a,b € Z. Then there are m,n € Z such that (a,b) = ma-+nb. 


Theorem 8.2.6 is proved by using the Well-Ordering Principle (Theorem 6.2.5) 
applied to the set of all natural numbers of the form ma-+nb, and then using the 
Division Algorithm (Theorem A.5 in the Appendix). It is left to the reader to work 
out the details of the proofs of Theorem 8.2.6 and Theorem 8.2.5. 

The reader’s task is to conjecture and prove as many results as possible about 
greatest common divisors, using only what is stated above. 

Greatest common divisors are discussed in many texts on number theory, for 
example [Ros05, Section 3.3]. 
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8.3 Divisibility Tests 


There are a number of known methods for determining whether one integer is divisi- 
ble by another integer. For example, an integer is divisible by 9 if and only if the sum 
of its digits is divisible by 9. Therefore, it is easily seen that 107523 is divisible by 9, 
because 1+0+7+5+2-+3= 18, and we know that 18 is divisible by 9. A proof of 
the validity of this method relies on the notion of congruence modulo 9, as discussed 
in Section 5.2. More precisely, it is shown in Exercise 5.2.12 that if @n,@m—1--+d2a) 
is a natural number written in decimal notation, then 


m m 


Yi g)10'"' = Ya; (mod 9). 
i=1 i=l 


The left-hand side of this congruence is the value of the integer written d@m_1 ++ -a2a1 
in decimal notation, and the right-hand side is the sum of the digits. Our method for 
verifying divisibility by 9 follows immediately from this congruence. We could also 
take this process one step further, using the notation of Exercise 5.2.13. If x € N, 
we let © (x) denote the result of repeatedly adding the digits of x until a single digit 
remains. It follows that a positive integer x is divisible by 9 if and only if E(x) = 9. 

The reader’s task is to try to find, and prove, similar methods for determining 
divisibility by other numbers. A good place to start is with divisibility by each of 2, 
3 and 5. It is also possible to use different bases for writing integers, instead of only 
decimal notation. 

A reference for this topic is [Ros05, Section 5.1]. 


8.4 Real-Valued Functions 


In Section 4.3 we discussed the most broadly applicable way of combining func- 
tions, namely, composition. In some specific situation, however, there are other ways 
to combine functions. In calculus courses, for example, we regularly deal with sums, 
differences, products and quotients of functions R — R. From the point of view of 
adding, subtracting, multiplying and dividing functions, it turns out to be irrelevant 
that the domain of these functions is R (though of course the domain being R is very 
important for taking derivatives and integrals). The addition, subtraction, multipli- 
cation and division of functions take place in the codomain, and hence these four 
operations can be applied to any functions with codomain R (or certain other sets 
such as the complex numbers, but we will not deal with that here). 
For convenience we use the following terminology. 


Definition 8.4.1. A real-valued function is a function of the form f: X — R, where 
X is a set. A 


Your task is to explore the properties of real-valued functions. For example, we 
can define addition of real-valued functions as follows. 
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Definition 8.4.2. Let X be a set, and let f,g: X — R be functions. The sum of f and 
g, denoted f + g, is the function f +g: X — R defined by 


(f+8)(x) = f(x) + g(x) 
for allx € X. A 


Observe that we can add two real-valued functions only if they have the same 
domains. Definition 8.4.2 is often referred to as “pointwise addition,” because it is 
done separately for each element in the domain. It is possible to define other point- 
wise operations, for example subtraction, multiplication and division. 

The following lemma is a typical simple result about addition of real-valued func- 
tions. For those familiar with the term, this lemma says that addition of real-valued 
functions is commutative. 


Lemma 8.4.3. Let X be a set, and let f,g: X — R be functions. Then f+g=g+f. 


Proof. Clearly f +g and g+f have the same domain, the set X, and the same 
codomain, the set R. Let x € X. Then 


(f +8) (x) = f(a) + 8() = g(a) +f) = (8+ f)@), 


where the middle equality holds because we know that a+ b = b+ a for all a,b € 
R, and we know that f(x) and g(x) are real numbers. (Recall that f(x) and g(x) 
are values in the codomain, which in this case is IR, and are not the names of the 
functions—which are simply f and g.) Hence f+ g = g+f. 


The reader should try to conjecture and prove other results about addition of real- 
valued functions, and should define other operations (such as multiplication), and 
also relations (such as less than), for real-valued functions, and then prove results 
about those definitions. 


8.5 Iterations of Functions 


The idea of iterations of functions was used in Exercise 4.4.20 and Exercise 4.4.21. 
Those two exercises are rather lengthy and difficult. Here we wish to look at some 
simpler properties of iterations of functions. For convenience, we repeat the basic 
definition. (As mentioned in Exercise 4.4.20, this definition, while intuitively rea- 
sonable, is not entirely rigorous, because the use of --- is not rigorous; a completely 
rigorous definition was given in Example 6.4.2 (2).) 


Definition 8.5.1. Let A be a non-empty set, and let f: A — A be a function. Suppose 
that f is bijective. For eachn €N, let f” denote the function A — A given by 


fl =for-of. 
VeeaH~— 


n times 


The function f” is the n-fold iteration of /. A 
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As simple as this definition might appear, iterations of functions are of great 
importance in many branches of mathematics, and have been the focus of particular 
attention in the field of dynamical systems, which deals, among many other things, 
with the much talked about fractals and “chaos.” 

Your task is to explore various properties of iterations of functions. Some possible 
questions to look at involve the following concepts. 


Definition 8.5.2. Let A be a non-empty set, and let f: A — A be a function. The 
function f is nilpotent if f” = 14 for some n € N. The function f is hidempotent 
if f" = f for some n € N such that n > 2. The function f is constantive if f” is a 
constant function for some n € N. A 


The term “nilpotent” is quite standard, whereas the other two terms in Defi- 
nition 8.5.2 are not (though “hidempotent” is meant to suggest the standard term 
“idempotent,” which means that f? = f.) 

There are many questions to be asked about these concepts. Is there a constantive 
function that is not constant? For any r € N such that r > 2, is there a function 
f: A—A for some set A such that f” is a constant function, but f’~! is not a constant 
function? Is there a nilpotent function that is not the identity function? For any given 
r €N such that r > 2, is there a function g: A — A for some set A such that g” = 1, 
but g”! ¥ 14. If a function is nilpotent or hidempotent, is it necessarily bijective? 
If a function is hidempotent and bijective, is it necessarily nilpotent? If a function is 
nilpotent, is it necessarily hidempotent? Do stronger conclusions hold when the set 
A is finite? 

The reader is asked to consider the above questions, and to try to think up other 
definitions and questions about iterations of functions, and to try to solve as many of 
those questions as possible. 

See [HW91, Chapter 5] or [ASY97] for details about iterations of functions in 
connection with dynamical systems and chaos. 


8.6 Fibonacci Numbers and Lucas Numbers 


In Section 6.4 we briefly discussed the Fibonacci numbers. There is much more that 
can be said about these remarkable numbers. We suggest four possible avenues for 
exploration. 


A) More Fibonacci Formulas 


We gave a number of nice formulas for the Fibonacci numbers in Proposition 6.4.6, 
Equation 6.4.1 and Exercises 6.4.7—6.4.9. The reader should play with the Fibonacci 
numbers, and try to find and prove other formulas for these numbers. 
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B) Lucas Numbers 


The Fibonacci numbers are not the only sequence of numbers that obey the Fibonacci 
recursion relation. If we change the initial two numbers, we obtain a different se- 
quence. One such sequence that is often studied in conjunction with the Fibonacci 
numbers is the Lucas sequence, which starts 


1,3,4,7, 11,18, 29,47,76, 123... 


The numbers in this sequence are referred to as Lucas numbers. Let L1,Z2,L3,... 
denote the terms of the Lucas sequence. This sequence is formally defined using 
Definition by Recursion as the sequence specified by L; = 1, and Lp = 3, and Ly+2 = 
Ly+1+Ln for alln € N. We use Theorem 6.4.5 to verify that such a sequence exists. 
The Lucas numbers turn out to be of use in primality testing; see [Rib96, Sections 2.4 
and 2.5] for details. 

The reader’s task is to conjecture and prove formulas for the Lucas numbers. Start 
by considering the analogs of the various formulas we have seen for the Fibonacci 
numbers. For example, do the analogs of the three parts of Proposition 6.4.6 hold 
for the Lucas numbers? Is there an explicit formula for the Lucas numbers similar to 
Binet’s formula (Equation 6.4.1)? 


C) Relations between Fibonacci and Lucas Numbers 


There are some formulas relating the Fibonacci numbers and the Lucas numbers. 
One such formula is L, = F;, +2F,_, for all n € N such that n > 2. The reader should 
prove this formula, and try to find and prove other such formulas. 


D) Fibonacci Numbers Modulo k 


Let k € N. We can then look at the Fibonacci sequence modulo k, which we obtain by 
taking the Fibonacci sequence, and replacing each Fibonacci number with the unique 
integer in {0,1,...,4— 1} that is congruent to it modulo k. For example, if we use 
k = 3, we obtain the modulo 3 Fibonacci sequence, which starts 
1,1,2,0,2,2,1,0,1,1,2,0.... 

Let Fe), 8), ... denote the terms of this sequence. Observe that Fe), = a +F: 3) 
(mod 3) for all n € N, as can be proved using Lemma 5.2.11. Observe also that this 
sequence repeats itself. What can we deduce about the original Fibonacci sequence 
from this repetition? The reader should play around with these ideas using various 
values for k, and try to formulate and prove results about either the original Fibonacci 
sequence, or the modulo k Fibonacci sequence for specific values of k. 


Some sources with many results about the Fibonacci numbers are [Knu73, Sec- 
tion 1.2.8 and exercises], [GKP94, Section 6.6] and [HHP97, Chapter 3]. 
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8.7 Fuzzy Sets 


A fundamental feature of sets is that any element either is in a given set or is not. 
There is no concept of something “probably” being in a set, nor of one element 
having a higher probability of being in a set than another. Unfortunately, the real 
world does not always give us black-and-white information, and so a more flexible 
notion of a “set” is helpful in dealing with some real-world problems. In response 
to this need, a theory of “fuzzy sets,” “fuzzy logic” and other related “fuzzy” things 
was developed in the 1960s. These ideas have applications in data analysis, pattern 
recognition, database management and other areas. Here we will just introduce the 
most basic definition concerning fuzzy sets. 

The method of introducing uncertainty into the definition of sets is to use the no- 
tion of characteristic maps (as discussed in Exercise 4.1.8, but which we will repeat 
here). For the entirety of our discussions, we will need to think of all sets under con- 
sideration as being subsets of some large set X, which in practice is not a problem in 
any given situation. 


Definition 8.7.1. Let X be a non-empty set, and let § C X be a subset. The charac- 
teristic map for S in X, denoted 7s, is the function 75: X — {0,1} defined by 


oy afb ifyes 
‘aia (Mee a A 


The characteristic map 75 maps everything in the set S to 1, and everything else 
to 0, and it is therefore useful for identifying the subset S. In Exercise 4.1.8, it was 
proved that if A,B C X are subsets, then ¥4 = 7z if and only if A = B (the former 
equality is of functions, the latter of sets). 

To allow fuzziness, we use characteristic maps that have values anywhere in the 
interval [0, 1], rather than in the two-element set {0, 1}. However, rather than defining 
the notion of a “fuzzy set” directly, and then defining characteristic maps for such 
sets, we simply let our broader type of characteristic maps be our new kind of sets. 


Definition 8.7.2. Let X be a non-empty set. A fuzzy subset A of X is a function 
Ma: X > (0, 1]. A 


The idea is that if A is a fuzzy subset of X, then x € X is definitely in A if a(x) = 
1, is definitely not in A if t14(x) = 0 and is somewhere in between if 0 < f4(x) < 1. 
Observe that a function fla: X — [0,1] is not the name of the fuzzy subset of X, but 
rather A is the name of the fuzzy subset; the function 4 defines the fuzzy subset A. 
Observe also that the functions U4 need not be particularly nice (for example, they 
do not need to be continuous). It is important to recognize that we only have fuzzy 
subsets of a given set X, but not fuzzy sets on their own. 

Once we have fuzzy subsets, we can also discuss unions, intersections and the 
like. Some sample definitions are as follows. 


Definition 8.7.3. Let X be a non-empty set, and let A and B be fuzzy subsets of X. 
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1. The empty fuzzy subset in X, denoted 0, is defined by Ug(x) = 0 for all 
xEX: 

2. The fuzzy subset A is a subset of the fuzzy subset B if U4(x) < a(x) for all 
xEX. 

3. The complement of A, denoted A’, is the fuzzy subset C of X defined by 
Mc(x) = 1— a(x) for all x € X. 

4. The union of A and B, denoted A UB, is the fuzzy subset D of X defined by 
p(x) = max{U,(x), Ue(x)} for all x € X. 

5. The intersection of A and B, denoted ANB, is the fuzzy subset E of X defined 
by We(x) = min{U,4(x), Wa(x)} for all x € X. 

6. The algebraic product of A and B, denoted A e B, is the fuzzy subset F of X 
defined by Ur (x) = Ma(x)- g(x) for all x EX. 

7. The algebraic sum of A and B, denoted A & B, is the fuzzy subset G of X 
defined by g(x) = a(x) + Ma(x) — Ua(x) - Up(x) for all x € X. A 


It is left to the reader to verify that the algebraic sum of two fuzzy subsets is 
indeed a fuzzy subset (the issue is that the characteristic map must have codomain 
(0, 1]). We are using some of the same notation for fuzzy subsets as for regular sets 
(sometimes referred to as “crisp” sets in fuzzy set literature); this notation is standard, 
and there is usually no confusion in a given context. 

Just as we proved various properties of operations on regular sets in Section 3.3, 
we can prove similar properties for operations on fuzzy subsets. For example, we 
have the following Distributive Law. 


Lemma 8.7.4. Let X be a non-empty set, and let A, B and C be fuzzy subsets of X. 
Then AN (BUC) = (ANB)U(ANC). 


Proof. Let x € X. We need to show that 


min{44a(x),max{ Hp (x), Hc(x)}} 
= max{min{ 4 (x), Ha(x) },min{ ta (x),Hc(x)}}. 


There are a number of cases. If [4(x) < Ua(x) and pa(x) < Uc(x), then 
min{a(x),max{ pp (x), Uc(x) }} = Ma(x) 
and 
max{min{ [4 (x), Ha(x)}, min {pa (x), Mc (x) }} = max {La (x), Ma (x) } = Ma(2). 
If Uc(x) < a(x) S a(x), then 
min{Ha(x),max{{a(x), Uc(x)} } = min{ Ma (x), M(x) } = a(x) 
and 


max {min {414 (x), a(x) },min{ a(x), He(x)}} = max{ta(x),Me(x)} = Ma (x): 
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The case when p(x) < Ma(x) < Uc(x) is similar to the previous case, and we omit 
the details. If g(x) < Uc(x) < Ma(x), then 


min{ 4 (x),max{ Hp (x),bc(x)}} = min{ Ha (x), Me(x)} = Me (x) 
and 


max {min{ Ha (x), a(x) },min{ Ma (x), He(x) }} = max{Ma(x), Uc(x)} = U(x). 


The case when [c(x) < Us(x) < Ma(x) is similar to the previous case, and we omit 
the details. Putting all the cases together proves the desired result. 


The reader’s task is to conjecture and prove as many results as possible about the 
operations on fuzzy subsets. Which of the results in Lemma 3.2.4 and Theorem 3.3.3 
have analogs for the union and intersection of fuzzy subsets, or for the algebraic sum 
and algebraic product of fuzzy subsets? Can similar operations be defined for indexed 
families of fuzzy subsets, analogously to what we saw in Section 3.4? 

See [BG95] and [Zim96] for further discussion of fuzzy sets and their applica- 
tions. 


8.8 You Are the Professor 


One of the best ways to learn something is to try to explain it to someone else. Now 
that you have had the opportunity to formulate and write many proofs, you are invited 
to take on the role of the professor in a class that teaches proofs. At the end of this 
section are a number of attempted proofs, all of which are actual homework exercises 
submitted by students. Every one of these proofs has problems, some large and some 
small, and your role as professor will be to critique these proofs. 

In order to help you keep in mind what you need to look for as you examine 
these proofs, we have provided a summary of some of the common mistakes that 
students make in writing proofs. Try to spot as many of these mistakes as possible in 
the proofs provided below. And, of course, try to avoid these mistakes in your own 
proofs. 


1. Incomplete Sentences, Undefined Symbols and Other Writing Problems 


Everyone, including the most experienced mathematician, makes honest mathemat- 
ical errors, but there is no excuse for careless writing. The ideas in mathematics 
are sometimes difficult, and there is no reason to make matters worse by taking al- 
ready challenging mathematical concepts and making them even harder to follow by 
phrasing them with incomplete sentences and other grammatical mistakes, by using 
undefined symbols for variables or by engaging in other forms of sloppy writing. 
Mathematics must be written carefully, and in proper English (or whatever language 
you use), no differently from any other writing. See Section 2.6 for more about writ- 
ing mathematics. 
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2. Quantifier Problems 


The importance of using quantifiers correctly in proofs was discussed in detail in 
Section 2.5. It is not possible to pay too much attention to the proper use of quanti- 
fiers. 


3. Failure to Strategize 


In contrast to the exercises in an introductory course such as calculus, where it is 
possible for a good student to write the correct solutions by simply starting with the 
hypothesis and working things out along the way, for more advanced material such as 
in this text, it is crucial to strategize the outline of the proposed proof before working 
out the details. Before going on a long road trip to an unfamiliar place, one first 
gets directions and looks at a map before commencing to drive; one would not start 
driving in whatever direction the car happened to be parked, and then start worrying 
about the directions after driving for a few hours. The same is true for mathematical 
proofs—first one needs to know the strategy, and only then does one work on the 
details. Figuring out a good strategy for a particular proof often takes no less effort 
than figuring out the details of the proof. 


4. Incorrect Strategy 


The only way to prove something is to do whatever is required to achieve the goal of 
the proof. For example, suppose that we need to prove that a function f: A — B is 
injective. Then the definition of injectivity needs to be used precisely as stated, and 
doing so leads to the proper strategy for such proofs, which is to let x,y € A and to 
assume f(x) = f(y), and then to deduce that x = y. Whatever hypotheses might be 
assumed about the function f (and something must be assumed, because not every 
function is injective), the overall strategy for the proof must be the one just men- 
tioned for proving that a function is injective. In general, the strategy for a proof is 
determined by what is being proved—and not by what is being assumed. Somewhere 
in the proof the hypotheses are going to be used (and if not then the hypotheses are 
not necessary), but the nature of the proof is guided not by the hypotheses, but by the 
goal of the proof. 


5. Missing or Disorganized Ideas 


A proof is not simply a collection of arguments, it is a collection of arguments in the 
correct logical order, starting with the hypotheses and proceeding in a logical step- 
by-step fashion to the conclusion. If a proof is missing steps it will not be complete, 
and even if one has all the right ideas for a proof, these ideas will not add up to a 
valid proof if they are not presented in the correct logical order. 
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6. Scratch Work Substituted for the Actual Proof 


Scratch work for a proof can be backwards, forwards or any combination thereof, 
and it is often not written in proper sentences and with correct grammar. The actual 
proof should be written properly, and must start with the hypotheses and end with 
the conclusion. It is therefore important to distinguish between the scratch work and 
the actual proof, which might have little resemblance to each other. Ultimately, what 
counts in a proof is whether the final draft stands on its own; what one does during 
scratch work is important to the person who does it, but it is not part of the actual 
proof. 


7. Failure to Check the Final Draft 


The last stage of writing a proof is to read over the proposed final draft as if it were 
written by someone else, to see if it works as written. Does every step follow from 
the previous step? Are all symbols appropriately defined? Is a valid strategy being 
followed? Are the definitions being used correctly? Is the grammar correct? Is the 
proof clearly written? If the answer to any of these questions is “no,” then the proof 
needs revision. 
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Read and comment upon the following attempted proofs as if you are the 
professor in a class that teaches proofs. Your comments should indicate what 
is wrong, and give suggestions for improvement. Some of these proofs are 
wildly incorrect, and others are mostly correct, though with small errors of 
content or writing style. Photocopy the proofs or download them at http: 
//math.bard.edu/bloch/you_are_the_prof .pdf, take a red pen and 
go at them mercilessly. 


Exercise 8.8.1. [Same as Exercise 2.5.5 (1)] Prove or give a counterexample to the 
following statement: For each real number x, there exists a real number y such that 
e~—y>0. 


Proof (A). The statement is true. For any x we can choose y = 0. Since 
0 < e* for all x, we have that for all x we can choose a y such that 
e*—y>0. 


Proof (B). The statement is true. Let y = 0. Since x € R, therefore e* — 
y > 0 for each x. 


Proof (C). The statement is true. Let x € R. For all x, e* > 0. Let y= 0. 
For all x, e*—y > 0. 


Exercise 8.8.2. [Same as Exercise 5.3.4 (1)] Let A and B be sets, and let f: A— B 
be a function. Let ~ be the relation on A defined by x ~ y if and only if f(x) = f(y), 
for all x,y € A. Prove that ~ is an equivalence relation. 


Proof (A). We will prove that ~ is reflexive. Suppose that x ~ x. Then 
f(x) = f(x), for x € A. Suppose that f(x) = f(x), then x ~ x. Hence ~ 
is reflexive. 


Proof (B). We will prove that ~ is symmetric. Suppose that f(x) = 
f(y). Then f(y) = f(x). Sox ~ y and y ~ x. Now suppose that f(x) 4 


f(y). Then f(y) 4 f(x). Sox 4 yand y 4x. Sox ~ yif and only if y ~ x; 
that is, ~ is symmetric. 


Proof (C). We will prove that ~ is transitive. Since f(x) = f(x), x~ x 
so ~ is reflexive. If z € A and f(x) = f(y) = f(z) thenx~ y andy~z 
implies x ~ z, so ~ is transitive. 
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Exercise 8.8.3. [Same as Exercise 3.3.11] Let X be a set, and let A,B,C C X be 
subsets. Suppose that AM B = ANC, and that (X —A) MB = (X —A)NC. Prove that 
B=C. 


Proof (A). First, we show that B C C. Let p © AMB. This means that 
p €Aand p €B. Since ANB=ANC,ANBCANC and ANC CANB. 
That means that p € ANB implies p € ANC. Since p € ANC, it follows 
that p € C. Because p € AMB means that p € B, p € B implies p € C. 
Therefore B CC. 

Second, we show that C C B. Let p € (X —A) NC. This means that 
p €C. Since (X —A) MC = (X —A) NB, we know that (X —A)NC C 
(X —A) MB. This means that p € (X —A)NC implies p € (X —A)NB. 
Since p € (X —A)/MB, it can be said that p € B. Since p € (X —A)NC 
means p € C, p € C implies p € B. Therefore C C B. 

We have shown both that B C C and C C B. Therefore, B = C. 


Proof (B). First, I will show that B C C. Let x € B. 

Considering ANB =ANC,x € ANB implies that x € ANC. According 
to Theorem 3.3.3 (1), ANB CB and ANC CC. Since ANB=ANC, 
it can be said that AN B CC. Since x € AM B and therefore x € B, and 
AMB CC, x is therefore also an element of C in the case that AN B = 
ANC. 

Likewise, (X —A) MB = (X —A)NC implies that a given element x 
exists in (X —A)MB and (X —A) NC. Therefore x ¢ A and x € B. Also, 
according to Theorem 3.3.3, (X —A) MC CC. Since (X —A)NC = (X — 
A)MB, then (X —A)MB CC. This implies that there is an element x € 
(X —A)MB and x € C. The phrase x € (X —A)MB can be further broken 
down to x ¢ A and x € B. Therefore B C C, regardless of whether x € A 
orx ZA. 

To prove that C C B, it suffices to show that element y € C implies 
y € B. Take (X —A) NB = (X —A)NC. Then y € (X —A) MB, and thus 
y € Band y ¢ A. According to Theorem 3.3.3, (X —A)MB C B. Since 
(X —A) MB = (X —A)NC, then (X —A)NC CB. This implies that y ¢ A 
and y€ C. Also y € B. Therefore, C C B. If BC CandC CB, then B=C. 


Proof (C). Letx € ANB. Then x € A and x € B. Also let x € ANC, then 
x € A and x € C. This shows that x € A, B and C. Since (X —A)NB= 
(X —A)NC, then x €X andx ¢A andx€ Bandx €X andx ¢A andx EC 
are equivalent. Therefore x € X, B, and C and x ¢ A. By Theorem 3.3.3, 
ANB CA andANBCB, and ANC CA and ANC CC. However, since 
x A, then B=C. 
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Exercise 8.8.4. [Same as Exercise 4.2.11] Let A and B be sets, let P,Q CA be subsets 
and let f: A — B be a function. 


(1) Prove that f(P) — f(Q) C f(P—Q). 
(2) Is it necessarily the case that f(P — Q) C f(P) — f(Q)? Give a proof or a 
counterexample. 


Proof (A). 


(1). Let b € B. By Definition 4.2.1, there is some x € f(P) — f(Q) such 
that b € Bis also b € f(P) — f(Q). By definition of f(P), there is some 
p © P and q € Q such that b € f(p) and b ¢ f(q). This means that 
there is a € A such that a € p anda ¢ q. Thus a € p —q, and therefore 
b€ f(p—q). Thus, x € f(P—@Q). Thus f(P) — f(Q) ¢ f(P—Q). 


(2). It is also the case that f(P —Q) C f(P) — f(Q). Let b € B. By Def- 
inition 4.2.1, there exists x such that x € f(P —Q) in which there exist 
elements p € P and q € Q such that b € f(p—q). Therefore b € f(p) and 
b €& f(q). Therefore x € f(P) and x ¢ f(Q), and thus x € f(P) — f(Q). 
Therefore x € f(P—@Q) implies that x € f(P) — f(Q). Thus f(P— 


Q) C f(P)— F(Q). 


Proof (B). 


(1). Let f(x) € f(P) — f(Q). This implies that f(x) € f(P) and f(x) ¢ 
f(Q), and thus that x € P and x ¢ Q. It follows that x € P— Q and that 
F(x) € f(P—Q). 

(2). Let f(y) € f(P—@Q). This implies that y € P— Q and thus that y € P 


7 ¢ Q. Hence, f(y) € f(P) and f(y) ¢ f(Q), and f(y) € f(P) — 
f(Q). 


Proof (C). 


(1). Let f: A — B be a function, and let x € f(P) — f(Q). So, by the 
definition of the image, it follows that f(x) € P and f(x) ¢ Q. Therefore 
by definition of the image, f(x) € P—Q, and x € f(P—@Q). So f(P) — 
F(Q) S f(P—-Q). 


(2). Now let x € f(P—@Q). So, by the definition of the image, f(x) € 
P—Q. This means that f(x) € P and f(x) ¢ Q. This implies that x € f(P) 
and x ¢ f(Q). Therefore x € f(P) — f(Q). It follows that f(P—Q) C 
F(P)— F(Q). 
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Proof (D). 


(1). Let k € f(P) — f(Q). This means that k € f(P) and k ¢ f(Q). By 
Definition 4.2.1 this means that k = f(p) for all p € P and that k 4 f(q) 
for any g € Q. Since k = f(p) for all p € P andk ¥ f(q) for all g € Q, 
it follows that f(p) # f(q). Because of Definition 4.2.1, which states 
that in a function f: A — B each a € A maps to only one b € B and 
each b € B has only one a € A that maps to it, the fact that f(p) 4 f(q) 
implies that p 4 q. Because p # q for all g € Q, it follows that p ¢ Q. 
Since p € P and p ¢ Q, we can derive that p € P — Q. This means that 
f(p) € f(P—Q), which means that k € f(P—Q). Since k € f(P)— f(Q) 
implies k € f(P—Q), it follows that f(P) — f(Q) C f(P—Q). 


(2). Let b € f(P—@Q). By Definition 4.2.1, there is some r € P—Q 
such that b = f(r). Also, r € P and r ¢ Q by Definition 3.3.6. Because 
b € B, b= f(r), and r € P, we know from Definition 4.2.1 (1) that 
be f(P). Similarly, since b € B, b = f(r), and r ¢ Q, we know that 


b € f(Q). Since r € f(P) and r ¢ f(Q), then r € f(P) — f(Q) from 
Definition 3.3.6. Therefore f(P—Q) C f(P) — f(Q). 


Proof (E). 


(1). Letx € f(P) —f(Q). Then x € f(P) but x ¢ f(Q). By the definition 
of image, x € B, and x = f(p) for all p € P. But x 4 f(q) for any g € Q. 

For convenience, we assign P— Q = Z. If x € f(Z), then x = f(z) 
for z € Z. We know that for p € P, p ¢ Q because that would make 
x = f(q) true, which we have shown is a false statement. Therefore, 
x= f(p) for p € P—Q. It follows that x € f(P—@Q). We have shown 


that.x € f(P)—f(Q) andx€ f(P—Q). Hence, f(P)— f(Q) C f(P—Q). 


(2). First, we prove that if x = f(a) for some a € P anda ¢ Q thenx # 
f(b) for any b € Q. This is a proof by contradiction. Suppose x = f(a) 
for some a € P and a ¢ Q, and that x = f(b) for some b € Q. We have 
reached our contradiction since x = f(a) for some a € P and a ¢ Q. This 
contradicts the fact that x = f(b) for some b € Q. Therefore x 4 f(b) 
for any b€ Q. 

We now prove that f(P—Q) C f(P) — f(Q). Let x € f(P—Q). We 
will show that x € f(P) — f(Q). Hence, x = f(a) for some a € P—Q, 
by definition of image. It follows that x = f(a) for some a € P and 
a ¢ Q from the definition of set difference. Hence, x 4 f(b) for any 
b € Q by the previous paragraph. Notice that x = f(a) for some a € 
P and x ¥ f(b) for any b € Q. Thus, x € f(P) and x ¢ f(q) by the 
definition of image. It follows that x € f(P) — f(Q) from the definition 
of set difference. Therefore, we conclude that f(P —Q) C f(P) — f(Q). 
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Exercise 8.8.5. [Same as Exercise 5.1.11 (1)] Let A and B be sets, let R and S be 
relations on A and B, respectively, and let f: A — B be a function. The function / is 
relation preserving if x R y if and only if f(x) S f(y), for all x,y € A. 

Suppose that f is bijective and relation preserving. Prove that f—! is relation 
preserving. 


Proof (A). Let p,q € B such that p S q. Let m,n € A such that f~!(p) = 
mand f—'(q) =n. Note that f(m) = p and f(n) = q. Hence f(m) S 
f(n). Since f is relation preserving, thus f(m) S f(n) implies m Sn for 
all m,n € A. Thus f—!(p) R f~!(q). Therefore, if x S y, then f~!(x) R 
Ff-(y) for all x,y € B. 

Suppose f—!(p) R f—'(q) for some p,q € B. Let m,n € A such that 
f-'(p) =mand f—!(q) =n. Thus m Rn. Since f is relation preserving, 
thus m Rn implies f(m) S f(n) for all m,n € A. Observe that f(m) = p 
and f(n) = q, hence p S q. Therefore if f~'(x) R f~'(y), then x S y for 
allx,y € B. 

Therefore f~! is relation preserving. 


Proof (B). Since f is relationship preserving, we know that x R y if and 
only if f(x) S f(y). So f(x) S f(y) if and only if x Ry, for all x,y EA 
and all f(x), f(y) € B. So f! is relationship preserving. 


Proof (C). Define f~! as the function g: B— A. Let w,z € B. Because g 
is the inverse of f, then let w = f(x) and let z= f(y). Thus w = f(x) and 
z= f(y). Because f(x) S f(y), then w Sz. Then f(w) = f-!(f(x)) =x 
and f(z) = f~'(f(y)) =y. Because x R y, then f(w) R f(z). Therefore 
if w Sz if and only if f(w) R f(z). Therefore f—! is relation preserving. 


Proof (D). Suppose f~!: B — A. Let m,n € B. Suppose m S n. Because 
f is both injective and relation preserving for all x,y € A, f(x) S f(y). 
Hence, f(x) and f(y) both correspond to elements in B, namely, some 
m,n € B. Therefore, because f(x) S f(y) implies x R y, m Sn implies 
f(m) R f(n) for all m,n € B. Suppose f(m) R f(n). Hence, f(m) and 
f(n) both correspond to elements in A namely, some x, y € A. Therefore, 
because x R y implies f(x) S f(y), f(m) R f(n) implies m Sn. Therefore 
f~! is relation preserving. 
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Proof (E). We begin by assuming that f: A — B is a bijective rela- 
tion preserving function. Since f is bijective, we know there exists a 
bijective inverse function f—!: B — A by Theorem 4.4.5. Since f—! is 
bijective, by definition we know it is both injective and surjective. Let 
p,q € B and suppose that f—'(p) = f—'(q). Since f—! is injective, then 
f-'(p) = f7'(q) implies that p = q for all p,q € B. Hence f~'(p) R 
f~'(q) implies p S q for all p,q € B. Letc € A. Leta =b forall a,b €B. 
Since f—! is surjective there exists some a, b such that f(a) =c = f(b). 
Therefore, f(a) = f(b). Hence a S b implies f~'(a) R f~'(b) for all 
a,b € B. Hence x Sy if and only if f~!(x) R f~!(y) for all x,y € B. Thus 
f_' is relation preserving. 


Appendix 


Properties of Numbers 


Throughout this book we have assumed an informal familiarity with the standard 
number systems used in high school mathematics. In this appendix we briefly sum- 
marize some of the commonly used properties of these number systems. A rigorous 
treatment of these number systems, including proofs of everything stated in this ap- 
pendix, can be found in [Blol1, Chapters 1 and 2]. 

All the numbers we deal with in this book are real numbers. In particular, we 
do not make use of complex numbers. We standardly think of the real numbers as 
forming the real number line, which extends infinitely in both positive and negative 
directions. The real numbers have the operations addition, multiplication, negation 
and reciprocal, and the relations < and <. (The real numbers also have the operations 
subtraction and division, but we do not focus on them in this appendix because they 
can be defined in terms of addition and multiplication, respectively.) Among the most 
important properties of the real numbers are the following. 


Theorem A.1. Let x, y and z be real numbers. 


(x+y) +z=x+(y+z) and (x-y)-z=x-(y-z) (Associative Laws). 
x+y=y+xandx-y=y-x (Commutative Laws). 
x+0=xandx-1=x (Identity Laws). 

x+(—x)=0  (Inverses Law). 

Ifx 40, thenx-x-!'=1  (Inverses Law). 

Ifx+z=y+z thenx=y (Cancellation Law). 

Ifz 40, thenx-z=y-zifand only ifx=y (Cancellation Law). 
x-(y+z) =(x-y)+(x-z) (Distributive Law). 

—(—x)=x (Double negation). 

10. —(x-+y) = (—a) +(-y). 

Il. (—x)-y=—(x-y) =x-(—y). 

12. Ifx <yandy <z thenx<z_ (Transitive Law). 

13. Precisely one of the following holds: eitherx <y, orx=y,orx>y  (Tri- 
chotomy Law). 
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14. Ifx<yandy<x, thenx=y  (Antisymmetry Law). 
15. x <y ifand only ifx+z<y+z. 
16. If z>0, then x < y if and only ifx-z<y-z. 


We mention here two additional facts about the real numbers, which we will need 
in Section 7.8, though nowhere else. These facts involve the absolute value of real 
numbers, which was defined in Exercise 2.4.9. A proof of the first of these facts may 
be found in [Blol1, Lemma 2.3.9]; the second fact can be deduced from the first 
without too much difficulty. 


Theorem A.2. Let x,y ER. 


I. |x+y| <|x|+ ly] (Triangle Inequality). 
2. |x| —|y| < |x +y| and |x| —|y| < |x. 


There are three particularly useful subsets of the real numbers, namely, the natu- 
ral numbers, the integers and the rational numbers. 
The set of natural numbers is the set 


N = {1,2,3,4,...}. 


The sum and product of any two natural numbers is also a natural number, though the 
difference and quotient of two natural numbers need not be a natural number. Being 
real numbers, the natural numbers satisfy all the properties of real numbers listed 
above. The natural numbers also satisfy a number of special properties not satisfied 
by the entire set of real numbers, for example the ability to do proof by induction; 
see Section 6.2 for more about the natural numbers. 

We mention here one additional property of the natural numbers, which we will 
need in Section 7.8, though again nowhere else. This property, rather than being 
about the natural numbers themselves, refers to the way that the natural numbers sit 
inside the real number. 


Theorem A.3. Let x € R. Then there is some n € N such that x <n. 


This theorem may seem intuitively obvious, but it is not trivial to prove, because 
its proof relies upon the Least Upper Bound Property of the real numbers. It would 
take us too far afield to discuss the Least Upper Bound Property, but we will mention 
that it is the property of the set of real numbers that distinguish that set from the 
set of rational numbers; there is no difference between these two sets in terms of 
algebraic properties of addition, subtraction, multiplication and division. See [Blo11, 
Section 2.6] for a discussion of the Least Upper Bound Property in general, and a 
proof of Theorem A.3 in particular. 

The set of integers is the set 


B= {ja—3)-2, 10 123)1 


The sum, difference and product of any two integers is also an integer, though the 
quotient of two integers need not be an integer. Being real numbers, the integers 
satisfy all the properties of real numbers listed above. 


Appendix: Properties of Numbers 343 


We will need two additional properties of the integers; these properties do not 
hold for all real numbers. Our first property, given in the following theorem, is very 
evident intuitively, though it requires a proof; see [Blol1, Exercise 2.4.4] for details. 


Theorem A.4. Let a,b € Z. Ifab = 1, thena= 1 andb=1, ora=—1l\andb=~-1. 


Our second property of the integers, which is much less obviously true than the 
previous property, is known as the Division Algorithm, though it is not an algorithm 
(the name is simply historical). See [RosO5, Section 1.5] for a proof. 


Theorem A.5 (Division Algorithm). Let a,b € Z. Suppose that b £ 0. Then there 
are unique q,r © Z such that a= qb+rand0<r< |b]. 


The set of rational numbers, denoted Q, is the set of all real numbers that can 
be expressed as fractions. That is, a real number x is rational if x = 5 for some 
integers a and b, where b ¥ 0. Clearly, a rational number can be represented in more 
than one way as a fraction, for example 5 — 2. However, as we now state, there 
is always a particularly convenient representation of each rational number, namely, 
writing it in “lowest terms.” This latter concept is phrased using the notion of integers 
being relatively prime, as defined in Exercise 2.4.3. The following theorem can be 
proved using the Fundamental Theorem of Arithmetic, which is found in [Ros05, 
Section 3.5]; a proof of the following theorem is also found in [Olm62, Section 402 
and Section 404]. 


Theorem A.6. Let x € Q. Suppose that x £0. There are a,b € Z such that x = § and 
a and bare relatively prime. The integers a and b are unique up to negation. 


It can be shown that the rational numbers are precisely those real numbers that 
have decimal expansions that are either repeating, or are zero beyond some point; 
see [Blol1, Section 2.8] for a proof. The sum, difference, product and quotient of 
any two rational numbers is also a rational number, except that we cannot divide by 
zero. The rational numbers are not all the real numbers; for example, the number J2 
is not rational, as is proved in Theorem 2.3.5. Again, being real numbers, the rational 
numbers satisfy all the properties of real numbers listed above. 

The rational numbers also satisfy some additional nice properties, for example, 
they are “dense” in the real number line, which means that between any two real 
numbers, no matter how close, we can always find a rational number; see [Blo1 1, 
Theorem 2.6.13] for a proof. We will rarely make use of such facts. 
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two-column, xx, 1, 27, 49, 52 
proper subset, 97 
propositional logic, 34 
puzzle, 220 
Pythagorean Theorem, xix, 50 


quantifier, 34 
existential, 36 
in theorems, 70 
universal, 35 
quantum mechanics, 258 
quotient set, 186 


rabbits, 215 
range, 140 
rational numbers, 60, 93 
real numbers, 93 
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reflexive, 173, 271 

symmetric, 173 

transitive, 173, 271 
relation preserving function, 177, 339 
relatively prime, 69, 257, 325, 343 
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set, 93 
countable, 224 
countably infinite, 224 
denumerable, 224 
difference, 103 
disjoint, 103 
element, 93 
empty, 93 
equality, 97 
finite, 98, 224 
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